VMGLOM gene and its mutations causing disorders with a vascular component

ABSTRACT

The present invention relates to genes responsible for disorders with a vascular component, the identification of mutations in said genes and the detection of their sequences as well as methods for detection and treatment for disorders with a vascular component. This invention further relates to proteins encoded by said genes and their applications.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is the U.S. National Phase under 35 U.S.C. §371 of International Application No. PCT/EP01/01760, filed Feb. 16, 2001, designating the United States and published in English, which claims priority to European Application No. 00870022.1, filed Feb. 16, 2000, U.S. Provisional Application No. 60/195,577, filed Apr. 6, 2000 and European Application No. 00870320.9, filed Dec. 22, 2000.

FIELD OF THE INVENTION

The present invention relates to the field of molecular biology. More particularly the present invention relates to the identification of new genes. More particularly to the detection and treatment of venous malformations with glomus cells. The present invention relates to the identification of genes residing in the VMGLOM locus, responsible for disorders with a vascular component, the identification of mutations in said genes and the detection of their sequences as well as methods of treatment for disorders with a vascular component based on said gene sequences

BACKGROUND OF THE INVENTION

Venous malformations (VMs) are bluish-purple lesions that can be single or multiple (Vikkula et al. 1998). They are most often localized on the skin and mucous membranes. In two families in which these lesions are inherited as an autosomal dominant trait, a locus (VMCM1) was identified on chromosome 9p21 that is linked to the phenotype (Boon et al. 1994; Gallione et al. 1995). It was found that the mutation in this locus is in the gene encoding the endothelial-specific receptor tyrosine kinase TIE-2. The R849W mutation in the intracellular kinase domain of TIE-2 leads to hyper-activation of the receptor in a ligand-independent manner (Vikkula et al. 1996). Another amino acid substitution, Y897S, identified in a separate family, seems to have a similar effect (Calvert et al. 1999).

Recently a second locus (VMGLOM) was identified on chromosome 1p21-22 for a subtype of VMs called “glomangiomas” because of the presence of undifferentiated smooth-muscle cells (“glomus cells”) in histological slides (Boon et al. 1999). Three positional candidate genes: DR1 (depressor of transcription 1), TGFBR3 (transforming growth factor-β receptor, type 3) and TFA (tissue factor) were screened and excluded. The identification of a candidate gene in the 5 Mbp VMGLOM locus would allow detection of mutations involved with venous malformations. It is thus an aim of the present invention to provide nucleic acid sequences representing genes involved with disorders with a vascular component as well as methods for diagnosis and treatment of disorders with a vascular component.

SUMMARY OF THE INVENTION

The present invention relates to an isolated nucleic acid molecule selected from any of the following:

-   -   a) a nucleic acid molecule encoding a human polypeptide having a         sequence which is more than 68%, preferably more than 70%, more         preferably more than 80% homologous to the sequence as         represented in SEQ ID NO 2,     -   b) a nucleic acid molecule encoding a human polypeptide having         an amino acid sequence as represented in SEQ ID NO 2 or a         shorter fragment thereof as represented in SEQ ID NO 4,     -   c) a nucleic acid molecule having a nucleotide sequence as         represented in SEQ ID NO 1 or 3,     -   d) a nucleic acid molecule encoding a mammalian non-human         polypeptide which is a biological equivalent of a human         polypeptide as mentioned in a) or b),     -   e) a nucleic acid molecule encoding a mouse polypeptide having         an amino acid sequence as represented in SEQ ID NO 6 or 8, and,     -   f) a nucleic acid molecule having a nucleotide sequence as         represented in SEQ ID NO 5 or 7,         or the complement thereof.

Said nucleic acid sequences represent the genes for venous malformations with glomus cells and for other disorders with a vascular component, or synthetic versions thereof.

The present invention further provides a nucleic acid molecule as defined above having a nucleotide sequence modification, with said modification resulting in patients bearing said modification in their genome having disorders with a vascular component.

The present invention further relates to a nucleic acid molecule as defined here above, wherein said nucleotide sequence modification is selected from the group of nucleotide mutations consisting of point mutations, deletions, insertions, rearrangements, translocations and other mutations and preferably selected from the mutations as indicated in Table 8 or 9, such that the resulting nucleic acid sequence is altered.

The present invention also relates to a probe or primer containing a sequence comprising at least 15 contiguous nucleotides of a nucleic acid sequence as defined above.

The present invention also relates to an isolated polypeptide selected from the following:

-   a) a human polypeptide having a sequence which is more than 68%,     preferably more than 70%, more preferably more than 80% homologous     to the sequence as represented in SEQ ID NO 2, -   b) a human polypeptide having an amino acid sequence as represented     in SEQ ID NO 2 or a shorter fragment thereof as represented in SEQ     ID NO 4, -   c) a mammalian non-human polypeptide which is a biological     equivalent of a human polypeptide as mentioned in a) or b), and, -   d) a mouse polypeptide having an amino acid sequence as represented     in SEQ ID NO 6 or 8,     or a functional part thereof.

The present invention also relates to a nucleic acid or polypeptide molecule as defined above for use as a medicament or a diagnostic kit.

The present invention also relates to the use of a molecule as defined above for the preparation of a medicament for preventing, treating or alleviating disorders with a vascular component or for the preparation of a diagnostic kit for detecting disorders with a vascular component.

-   -   The present invention further relates to a method for detecting         the presence of mutations in a nucleic acid sequence as defined         above in a sample containing nucleic acids.

The present invention also relates to a method for diagnosis of disorders with a vascular component in a patient comprising detecting a mutation in a nucleic acid sequence as defined above or detecting a nucleic acid as defined above.

The present invention also relates to a method for screening molecules for preventing, treating or alleviating disorders with a vascular component comprising the steps of:

-   -   a) contacting the molecule to be screened with a nucleic acid as         defined above, or with a polypeptide as defined above, and,     -   b) detecting the formation of a complex or detecting the         interaction between said molecule and said nucleic acid or said         polypeptide.

The present invention relates to a molecule identifiable by a method as defined above.

The present invention relates to a method for the production of a composition comprising the steps of producing a compound identifiable by a method as defined above and mixing said identified compound with a pharmaceutically acceptable carrier.

The present invention also relates to an antibody characterized in that it specifically recognises a polypeptide as defined above, or an antigenic fragment thereof.

The present invention also relates to a DNA construct comprising at least part of a nucleic acid as defined above, wherein the coding sequence of said nucleic acid is operably linked to a control sequence enabling the expression of the coding sequence of said nucleic acid by a specific host.

The present invention also relates to a host cell transformed with a DNA construct as defined above.

The present invention also relates to a recombinant polypeptide encoded by a nucleic acid as defined above or part thereof, said recombinant polypeptide being produced by:

-   -   a) culturing said transformed cellular host as defined above         under conditions which allow the expression and possibly         secretion of the encoded polypeptide, and     -   b) optionally, recovering the expressed polypeptide from said         culture.

The present invention also relates to a method for treating or alleviating disorders with a vascular component comprising the use of molecule which allows to interfere with the expression of a polypeptide as defined above in a patient.

The present invention also relates to a method for the diagnosis of disorders with a vascular component in a patient comprising the use of at least a nucleic acid sequence as defined above or a probe or primer as defined above or an antibody as defined above. The present invention also relates to a kit for the diagnosis of disorders with a vascular component in a patient comprising at least a probe or primer as defined above or an antibody as defined above.

DETAILED DESCRIPTION OF THE INVENTION

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology and recombinant DNA technology, which are within the skill of the art. Such techniques are explained fully in the literature.

All publications cited herein are hereby incorporated by reference in their entirety. In describing the present invention, the following terms will be employed, and are intended to be defined as indicated below.

The term “disorders with a vascular component” refers to disorders and diseases, in which there is altered vascular development, growth and/or maintenance or other abnormality, or altered size, structure, number etc. of blood vessels, such as in vascular anomalies (including several different types; for example hemangiomas, and arterial, capillary, lymphatic, venous and combined malformations), in other congenital and acquired vascular problems, such as aortic dilatation, coarctation of aorta, annuloaortic ectasia, angiopathies, occlusive vascular disorders, atherosclerotic vascular disease, ischemic heart disease, limb ischemia etc., as well as in disordres in which the vascular phenotype might not be the primary cause of the disease, such as in tumor induced angiogenesis, diabetic retinopathy, rheumatoid arthritis, etc.

The term “vascular” refers to the whole vascular system, i.e. venous, capillary, arterial and lymphatic vessels.

The term “nucleic acid” refers to genomic or complement DNA or RNA, amplified versions thereof, or the complement thereof. The term nucleic acid may refer to a complete gene or a part thereof and may refer to genes (including introns) or synthetic versions thereof.

The term “gene” as used herein refers to any DNA sequence comprising one to several operably linked DNA fragments such as a promoter and a 5′ untranslated region (the 5′UTR), a coding region (which may or may not code for a protein), and an untranslated 3′ region (3′UTR) comprising a polyadenylation site. Typically in mammalian cells, the 5′UTR, the coding region and the 3′UTR (together referred to as the transcribed DNA region) are transcribed into an RNA which, in the case of a protein encoding gene, is translated into a protein. A gene may include additional DNA fragments such as, for example, introns.

The preferred mutations in said genes are given in Table 8.

The term “complement” refers to a nucleotide sequence which is complementary to an indicated sequence and which is able to hybridize to the indicated sequence.

The term “primer” refers to a single stranded nucleotide sequence capable of acting as a point of initiation for synthesis of a primer extension product which is complementary to the nucleic acid strand to be copied. The length and the sequence of the primer must be such that they allow to prime the synthesis of the extension products. Preferably the primer is about 5-50 nucleotides. Specific length and sequence will depend on the complexity or the required DNA or RNA targets, as well as on the conditions of primer use such as temperature and ionic strength.

The fact that amplification primers do not have to match exactly with corresponding template sequence to warrant proper amplification is amply documented in the literature (see for instance Kwok et al., 1990).

The term “probe” according to the present invention refers to a single-stranded oligonucleotide sequence which is designed to specifically hybridize to any of the polynucleic acids of the invention. The probes used in the process of the invention can be produced by any method known in the art, such as cloning of recombinant plasmids containing inserts including the corresponding nucleotide sequences, if need be, by cleaving the latter out from the cloned plasmids upon using the appropriate nucleases and recovering them (e.g., by fractionation according to molecular weight). The probes can also be synthesized chemically, for instance, by the conventional phopho-triester method.

The probes of the invention can optionally be labelled using any conventional label. Primers and probes according to the present invention may also be directed against the introns of the nucleic acid sequences as defined above. The probes according to the present invention preferably hybridize to a region of a nucleic acid molecule according to the claims comprising a nucleotide sequence modification (mutation) resulting in patients bearing said modification in their genome having disorders with a vascular component.

The primers according to the present invention may specifically bind to a region of a nucleic acid molecule according to the claims comprising a nucleotide sequence modification (mutation) resulting in patients bearing said modification in their genome having disorders with a vascular component. By binding to said region, said primers are able to differentially amplify a wild-type and a mutated nucleic acid of the invention.

The term “mutation” in the context of the present invention refers to any change in the identity of a nucleotide or a change in the succession of nucleotides in the nucleic acid strand(s) which may occur, including nonsense, frameshift and missense mutations, small insertions (e.g. 1, 2, 3, 4, 5 or more nucleotides) or deletions (e.g. 1, 2, 3, 4, 5 or more nucleotides), large deletions encompassing substantial parts of the gene as well as encompassing the total gene, translocations, and any other change known to the person skilled in the art.

The term “translocation” means an event in which part of one chromosome has broken off and become attached to another chromosome or part thereof.

The present invention also relates to a method for detecting the presence of mutations in a nucleic acid according to the invention in a sample containing nucleic acids comprising the steps of:

-   a) possibly isolating and purifying the nucleic acids from said     sample by means of methods known in the art, -   b) contacting said nucleic acids of said sample with at least a     probe or a primer as defined above, -   c) detecting said wild-type or mutated nucleic acid of the invention     by means of specific hybridization, or in the alternative, -   d) detecting said wild-type or mutated nucleic acid of the invention     by means of an amplification reaction such as PCR possibly combined     with for instance a hybridization or sequencing reaction.

The term “amplification” used in the context of the present invention refers to polymerase chain reaction (PCR) or any other type of nucleic acid amplification method, such as ligase chain reaction (LCR; Landgren et al., 1988; Wu and Wallace, 1989; Barany, 1991), nucleic acid sequence based amplification (NASBA; Guatelli et al., 1990; Compton, 1991), transcription-based amplification system (TAS; Kwoh et al., 1989), strand displacement amplification (SDA; Duck, 1990; Walker et al., 1992) or amplification by means of Qss replicase (Lizardi et al., 1988; Lomeli et al., 1989) or any other suitable method to amplify nucleic acid molecules. The amplification reaction is preferably repeated between 20 and 70 times, advantageously between 25 and 45 times.

In another embodiment of the present invention, a molecule according to the invention can be used as a medicament or in a diagnostic kit.

In a more preferred embodiment, said medicament is used for the diagnosis, prevention, alleviation or treatment of disorders with a vascular component or for the preparation of a diagnostic kit for detecting disorders with a vascular component. In yet another preferred embodiment, said molecule according to the invention can be used for the preparation of a medicament for preventing, treating or alleviating disorders in which an alteration of vascular smooth muscle cell phenotype is needed. As illustrated in the examples, due to the known interaction between FKBP12 and the TGFβ type I receptor, it is likely that glomulin, via FKBP12, modulates TGFβ receptor signaling. Vascular smooth muscle cell differentiation has been shown to be induced by TGFβ. As “phenotypic modulation” of vascular smooth muscle cells has been shown in several conditions, such as in atherosclerotic plaque, it is also likely that glomulin, via TGFβ, modulates this phenotypic change. Thus, glomulin may have use as such or as a target, when alteration of (vascular) smooth muscle cell phenotype is needed.

According to yet another preferred embodiment, said molecule according to the invention can be used for the preparation of a medicament for preventing, treating or alleviating varicosities. This is again illustrated in the examples where Western blot data show glomulin expression in many veins, and varicose veins are encountered in families with inherited glomuvenous malformations.

According to another preferred embodiment, said molecule according to the invention can be used for the preparation of a medicament for preventing, treating or alleviating cardiopathies or cardiomyopathies. The inventors found high RNA expression levels supported by the glomulin protein detection by Western blot analysis in heart tissue, underlining the fact that glomulin is likely to have an important function in heart. Several clinical entities affecting the heart and associated tissues (cardiopathies or cardiomyopathies) are known, and may encounter alterations in glomulin function, which thus can serve as target for e.g. diagnosis, treatment and prevention.

According to another preferred embodiment, said molecule according to the invention can be used for the preparation of a medicament for preventing, treating or alleviating cerebral disorders. As illustrated in the examples, Northern blot analysis has also detected high expression of glomulin in the brain. As brain vessels are not specifically rich in smooth muscle cells, but rather pericytes, this expression may originate from the cerebral vascular endothelial cells and/or pericytes, and/or parenchymal cells. Glomulin is likely to have a special function in the brain, and thus serve in e.g. in the diagnosis, treatment and prevention of cerebral disorders.

Other related disorders with can be prevented and/or treated within the scope of this invention are disorders by modulation of the immune response. Due to the interaction of FAP48 with FKBP59 and FKBP12, glomulin is likely to have a similar action. Thus, glomulin may act as an immunomodulator, and have use in the treatment of various conditions in which modulation of immune response is needed, such as e.g. in atopic dermatitis.

Finally, said molecule according to the invention can also be used preventing, treating or alleviating cancer. Indeed, Northern blot analysis has detected expression of glomulin in cancers, such as cervical adenocarcinoma (Hela S3), lung carcinoma epithelial cell line (A549), leukemias (K-562, MOLT-4, and HL-60), Burkitt's lymphomas (Raji and Daudi) and colorectal adenocarcinoma, epithelial cell line (SW480). These cell lines are not vascular endothelial cells or vascular smooth muscle cells. Thus, glomulin may encounter alteration in e.g. expression or concentration in cancers, and thus serve as a target e.g. for diagnosis, treatment and prevention.

Also according to the invention, the identification of the presence or absence of said mutation in any of the methods of the invention can be done by direct sequencing or by micro array methods. Preferably, the present invention further relates to a method for detecting the presence of mutations in a nucleic acid sequence as defined above in a sample containing nucleic acids comprising the steps of:

-   -   a) contacting said nucleic acids of said sample with at least         one probe or primer as defined above, with said probe or primer         being preferably able to detect a nucleotide sequence         modification as defined above,     -   b) detecting said wild-type or mutant nucleic acid of the         invention by specific hybridisation or amplification, and,     -   c) possibly sequencing said amplification products of step c).

Also other methods can be used to identify such mutations including methods such as STS-PCR, countourclamped homogeneous electric field (CHEF) gel electrophoresis, restriction mapping, hybridization, Southern and Northern blotting, FISH analysis, mismatch cleavage, single strand conformation polymorhism (SSCP) or any other method known in the art. The diagnostic methods of the present invention also include segregation analysis, involving PCR-based genotyping and/or haplotyping methods. The diagnostic methods according to the present invention also include methods based on direct sequencing or CAS (coupled amplification and sequencing) optionally combined with additional analytic steps as known in the art, such as ligation analysis to detect and evaluate mutations.

The terms “protein” of the invention and “polypeptide” of the invention are equivalent and interchangeable. These terms also capture proteins substantially homologous and functionally equivalent to native proteins. Thus, the term encompasses modifications, such as deletions, additions and substitutions (generally conservative in nature), to the native sequences, as long as the biological activity of said polypeptide is not destroyed. Such modifications of the primary amino acid sequence may result in polypeptides which have enhanced activity as compared to the native sequence. These modifications may be deliberate, as through site-directed mutagenesis, or may be accidental, such as through mutations of hosts which produce the protein. All of these modifications are included, so long as biological activity is retained.

Two nucleotide or amino acid sequences are “substantially homologous” according to the present invention when at least about 65% (preferably at least about 80% to 90%, and most preferably at least about 95%) of the nucleotides or amino acids match over a defined length of the molecule. As used herein, substantially homologous also refers to sequences showing identity to the specified DNA or polypeptide sequence. DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system.

The term “functionally equivalent” intends that the amino acid sequence of the subject protein is one that will give a defined biological activity, equivalent to or better than, the biological activity a non-mutated protein of the invention.

An “antigen” refers to a molecule containing one or more epitopes that will stimulate a host's immune system to make a humoral and/or cellular antigen-specific response. The term is also used interchangeably with “immunogen”.

A “hapten” is a molecule containing one or more epitopes that does not stimulate a host's immune system to make a humoral or cellular response unless linked to a carrier.

The term “epitope” refers to the site on an antigen or hapten to which a specific antibody molecule binds. The term is also used interchangeably with “antigenic determinant” or “antigenic determinant site.”

The term “functional part” of a polypeptide or protein refers to a (poly)peptide or amino acid sequence, respectively, which has at least one identical or at least one equivalent biological activity compared to the protein it is derived from. Such parts will usually be at least about 10 amino acids in length, and preferably at least about 15 or 20 amino acids in length. There is no critical upper limit to the length of the fragment, which could comprise nearly the full length of the protein sequence. The terms “polypeptide” and “protein” include oligopeptides, protein fragments, analogs, muteins, fusion proteins and the like.

By “isolated protein” is meant a protein separate and discrete from a whole organism (live or killed) with which the protein is normally associated in nature. Thus, a protein produced synthetically or recombinantly would constitute an isolated protein.

“Recombinant” polypeptides refer to polypeptides produced by recombinant DNA techniques; i.e., produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide. “Synthetic” polypeptides are those prepared by chemical synthesis.

A “replicon” is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control.

A “vector” is a replicon, such as a plasmid, phage, or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.

The term “comprising” within the context of the present invention is to be understood as containing at least an item or step as claimed but possibly also containing more than that item or step. Comprising thus constitutes open language.

In order to identify additional genes encoding the proteins of the present invention and particularly proteins from other non-human mammals, recombinant techniques can be employed. These techniques are well known in the art and include DNA library screening or PCR cloning all well known in the art.

DNA sequences encoding proteins of the invention can be prepared synthetically rather than cloned. The DNA sequence can be designed with the appropriate codons for the particular amino acid sequence. In general, one will select preferred codons for the intended host if the sequence will be used for expression. The complete sequence is assembled from overlapping oligonucleotides prepared by standard methods and assembled into a complete coding sequence.

Once coding sequences for the desired proteins have been prepared or isolated, they can be cloned into any suitable vector or replicon. Numerous cloning vectors are known to those of skill in the art, and the selection of an appropriate cloning vector is a matter of choice.

The gene can be placed under the control of a promoter, ribosome binding site (for bacterial expression) and, optionally, an operator (collectively referred to herein as “control” elements), so that the DNA sequence encoding the desired protein is transcribed into RNA in the host cell transformed by a vector containing this expression construction. The coding sequence may or may not contain a signal peptide or leader sequence. Leader sequences can be removed by the host in post-translational processing.

In addition to control sequences, it may be desirable to add regulatory sequences which allow for regulation of the expression of the protein sequences relative to the growth of the host cell. Regulatory sequences are known to those of skill in the art, and examples include those which cause the expression of a gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. Other types of regulatory elements may also be present in the vector, for example, enhancer sequences.

An expression vector is constructed so that the particular coding sequence is located in the vector with the appropriate regulatory sequences, the positioning and orientation of the coding sequence with respect to the control sequences being such that the coding sequence is transcribed under the “control” of the control sequences (i.e., RNA polymerase which binds to the DNA molecule at the control sequences transcribes the coding sequence). Modification of the sequences encoding the particular antigen of interest may be desirable to achieve this end. For example, in some cases it may be necessary to modify the sequence so that it may be attached to the control sequences with the appropriate orientation; i.e., to maintain the reading frame. The control sequences and other regulatory sequences may be ligated to the coding sequence prior to insertion into a vector, such as the cloning vectors described above. Alternatively, the coding sequence can be cloned directly into an expression vector which already contains the control sequences and an appropriate restriction site.

In some cases, it may be desirable to add sequences which cause the secretion of the polypeptide from the host organism, with subsequent cleavage of the secretory signal. It may also be desirable to produce mutants or analogs of the antigens of interest. Mutants or analogs may be prepared by the deletion of a portion of the sequence encoding the protein, by insertion of a sequence, and/or by substitution of one or more nucleotides within the sequence. Techniques for modifying nucleotide sequences, such as site-directed mutagenesis, are well known to those skilled in the art.

Depending on the expression system and host selected, the proteins of the present invention are produced by growing host cells transformed by an expression vector described above under conditions whereby the protein of interest is expressed. The protein is then isolated from the host cells and purified. If the expression system secretes the protein into growth media, the protein can be purified directly from the media. If the protein is not secreted, it is isolated from cell lysates. The selection of the appropriate growth conditions and recovery methods are within the skill of the art. The proteins of the present invention may also be produced by chemical synthesis such as solid phase peptide synthesis, using known amino acid sequences or amino acid sequences derived from the DNA sequence of the genes of interest. Such methods are known to those skilled in the art. Chemical synthesis of peptides may be preferable if a small fragment of the antigen in question is capable of raising an immunological response in the subject of interest.

In particular, the inventors have expressed glomulin in bacteria as illustrated in the examples. Therefore they developed two prokaryotic glomulin expression constructs, one without and one with a Histidine tag facilitating the purification step.

Furthermore, they developed two constructs which can be used for the generation of transgenic animals, as illustrated in FIG. 36.

The proteins of the present invention or their fragments can be used to produce antibodies, both polyclonal and monoclonal. If polyclonal antibodies are desired, a selected mammal, (e.g., mouse, rabbit, goat, horse, pig etc.) is immunized with an antigen of the present invention, or its fragment, or a mutated antigen. Serum from the immunized animal is collected and treated according to known procedures. If serum containing polyclonal antibodies is used, the polyclonal antibodies can be purified by immunoaffinity chromatography, using known procedures.

Monoclonal antibodies to the proteins of the present invention, and to the fragments thereof, can also be readily produced by one skilled in the art. The general methodology for making monoclonal antibodies by using hybridoma technology is well known. Immortal antibody-producing cell lines can be created by cell fusion, and also by other techniques such as direct transformation of B lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. Panels of monoclonal antibodies produced against the antigen of interest, or fragment thereof, can be screened for various properties; i.e., for isotype, epitope, affinity, etc. Monoclonal antibodies are useful in purification, using immunoaffinity techniques, of the individual antigens which they are directed against.

Animals can be immunized with the compositions of the present invention by administration of the protein of interest, or a fragment thereof, or an analog thereof. If the fragment or analog of the protein is used, it will include the amino acid sequence of an epitope which interacts with the immune system to immunize the animal to that and structurally similar epitopes.

As illustrated in the examples, the inventors have used peptides synthesized from two amino acid sequences (SEQ ID NO 138 and SEQ ID NO 139) of glomulin and created polyclonal antisera against said peptides and tested them in Western blotting. This polyclona antisera was used to detect the glomulin protein in vitro in the bacterial expression. In addition, it was used to detect the presence of glomulin protein in various human tissues and eukaryotic cell lines. They found that glomulin seems to be expressed in a variety of tissues ranging from cardiovascular tissues to brain parenchyma and carcinoma cell lines.

The present invention also relates to a method for treating or alleviating disorders with a vascular component comprising the use of a molecule which allows to interfere with the expression or activity of a protein as defined in the claims in a patient or with the expression levels of the RNA encoded by the nucleic acids of the invention. A preferred molecule according to this embodiment is an antisense RNA molecule which is capable of hybridizing to the nucleic acid according to the invention. Advantageously, an antisense RNA molecule according to the present invention may be used as medicament, or in the preparation of a medicament for the treatment of disorders with a vascular component (antisense RNA therapy). The present invention also provides a pharmaceutical composition comprising an antisense RNA molecule according to the invention together with a pharmaceutically acceptable carrier, diluent or excipient therefor.

A further aspect of the present invention provides a method for determining whether a compound is an inhibitor or an activator of expression or biological activity of the polypeptide of the invention which method comprises contacting a cell expressing the polypeptide of the invention or cell extracts thereof or purified polypeptide of the invention with said compound and comparing the level of expression of the protein of said cell or cell extract or comparing the level of activity of said purified polypeptide according to the invention against an equivalent amount which has not been contacted with said compound. Alternatively said compound may be determined to be an inhibitor or activator of expression of the RNA encoded by the nucleic acid of the invention. Any compounds identified as inhibitors may advantageously be used as a medicament or in the preparation of a medicament for treating disorders with a vascular component which are alleviated by reducing or increasing the expression or activity of a polypeptide of the invention or by reducing or increasing the expression of RNA encoded by a nucleic acid according to the invention. These polypeptides can be wild-type or mutant polypeptides.

In an alternative embodiment of the invention, the inhibitory compounds may comprise antibodies according to the invention capable of recognising an epitope of a protein according to the invention and binding thereto. In this embodiment, the pharmaceutical composition comprises an effective amount of said antibody. In the same manner as described above, compounds which are identified as activators or enhancers of activity or expression of a protein of the invention or activators or enhancers of the expression level of the RNA encoded by the nucleic acids of the invention may be utilised as a medicament or in the preparation of a medicament for treating disorders with a vascular component alleviated by overexpression or enhanced of said protein of the invention.

There is also provided by the present invention a method of screening to identify compounds which interact with and bind to a protein according to the invention of the invention, which method comprises contacting a host cell expressing said protein or cell extracts comprising said protein or purified protein of the invention with a selection of said compounds and identifying any compounds which interact with or bind to said protein. The compounds may, for example, be labelled with a marker such as biotin or the like or a radiolabel so as to facilitate detection of said binding. The invention further includes a method for producing a compound as defined here above, which involves steps known to the person skilled in the art. The present invention further includes methods for producing a composition comprising mixing such a compound with a suitable pharmaceutically acceptable carrier also know in the art.

According to a next embodiment, the present invention is related to a non-human transgenic animal transformed by a nucleic acid according to the invention, or a DNA construct according to v.

In a more preferred embodiment, the present invention relates to a method for the production of a genetically modified non-human animal in which this modification results in overexpression, underexpression or a knock-out of the nucleic acids of the invention, or the polypeptides of the invention.

Said animal is preferably a mammal such as a mouse or a rat, transformed by a vector according to the invention and overexpressing a protein according to the invention, or genetically modified by a partial or total deletion of its genomic sequence encoding the protein according to the invention (a knock-out non-human mammal) and obtained by methods well known by the person skilled in the art.

As illustrated in the examples, the present inventors have cloned genomic fragments of the mouse glomulin gene which could be used for homologous recombination to result e.g. in ES cells that are genetically modified and generation of transgenic animals. In particular, they developed two constructs, one which would lead to a glomulin null-allele, and a second one allowing a conditional knock-out of the glomulin gene.

Other examples of genetically modified non-human animals provided by the invention are for instance transgenic non-human animals comprising an antisense sequence as defined above and complementary to the nucleic acid sequences according to the invention, and placed in such a way that it is transcribed into antisense mRNA which is complementary to the nucleic acid sequences according to the invention and which hybridises to said nucleic acid sequences, thereby reducing or blocking their translation.

The present invention also relates to a transgenic non-human animal comprising in its genome a nucleic acid according to the invention for use a model system for testing treatments to disorders with a vascular component.

The present invention also relates method for treating disorders with a vascular component by means of gene therapy, comprising administering to a patient in need of a normal version of a nucleic acid or gene of the invention at least part of this nucleic acid of the present invention or in the alternative switching off or lowering the possible overexpression of a nucleic acid or gene of the invention in a disorder with a vascular component.

Known gene therapy protocols can consist of delivering nucleic acids, such as by means of expression vectors for transfection and expression of said nucleic acids as to reconstitute the function of the affected gene, or alternatively delivering a functional form of the affected gene or protein. Expression constructs may be administered in any biologically effective carrier as known in the art. Retrovirus vectors, adenovirus vectors and adeno-associated virus vectors are exemplary recombinant gene delivery system for the transfer of exogenous genes in vivo, particularly into humans.

In addition to viral transfer methods, non-viral methods can also be employed, such as liposomal derived systems, poly-lysine conjugates, and artificial viral envelopes.

In clinical settings, the gene delivery systems for therapeutic use can be introduced into a patient by any of a number of methods, each of which is familiar in the art.

The pharmaceutical preparation of the gene therapy construct can consist essentially of the gene delivery system in an acceptable diluent, or can comprise a slow release matrix in which the gene delivery vehicle is imbedded. Alternatively, where the complete gene delivery system can be produced intact from recombinant cells, e.g. retroviral vectors, the pharmaceutical preparation can comprise one or more cells which produce the gene delivery system.

The gene should be administrated in a manner which results in sufficient expression of the non-defective gene. The following examples are for the purpose of better understanding the present invention but are in no way to be considered as limiting the invention.

The present invention also relates to an isolated nucleic acid molecule having a nucleotide sequence as represented in SEQ ID NO 142 to 152. These SEQ ID NO's represent the sequences of inter-exonic fragments obtained when determining the genomic structure of the VMGLOM gene (FIGS. 19 to 29). The inventors also dedicated the name “glomulin” to the gene encoding the full length VMGLOM “long form”. The genomic structure of human glomulin is shown in FIG. 16 and further described in the examples. The gene is composed of 19 exons, extends over 55 kbp and the complete cDNA sequence is given in FIG. 5.

The present invention further provides a nucleic acid molecule as defined in the previous paragraph and having a nucleotide sequence containing a modification, wherein said modification results in patients bearing said modification in their genome having disorders with a vascular component.

According to a further embodiment said modification is selected from the group of nucleotide mutations consisting of point mutations, deletions, insertions, rearrangements, translocations and other mutations and preferably selected from the mutations as indicated in Table 8, such that the resulting nucleic acid sequence is altered.

The inventors identified 13 different mutations in this glomulin gene in 19 families. Nine of the mutations were deletions or insertions that cause frame-shifts resulting in premature stop codons. Therefore, it is likely that the venous malformations present in said families are caused by loss-of-function of glomulin. This finding suggests that glomulin is important for the differentiation of vascular smooth muscle cells, and thus for vasculogenesis and angiogenesis.

According to another embodiment, the present invention relates to a probe or primer for use in the detection of a mutation occurring in a nucleic acid sequence according to the invention as defined above.

In particular, the inventors developed several sets of intronic primer pairs (Table 6) enabling the amplification of the 19 exons of the glomulin gene. These primers allow mutational screening via e.g. SSCP or Heteroduplex analysis, directly on genomic DNA. This method is less laborious compared to screening on cDNA produced from RNA extracted from resected venous malformations or from cultered lymphoblasts.

Therefore, the present invention also relates to a method for the diagnosis of disorders with a vascular component in a patient comprising detecting a mutation in a nucleic acid sequence as defined above or detecting a nucleic acid as defined above.

According to a further embodiment, the present invention relates to a method for diagnosis of disorders with a vascular component in a patient comprising:

-   (a) providing a sample containing nucleic acids from said patient, -   (b) isolating and possibly purifying nucleic acids from said sample, -   (c) amplifying said nucleic acids using primers as defined above, -   (d) analysing said amplified DNA indicative for the presence or     absence of a mutation in said nucleic acids.

According to a more preferred embodiment, the present invention relates to the method as defined above wherein the amplification is performed by means of the polymerase chain reaction (PCR) and the primers as defined above. Several methods can be used to analyse an amplified DNA or a mutation characteristic for said disorders of the invention. Said methods include for instance SSCP, heteroduplex analysis, sequencing or any other method as described earlier in the description.

According to another aspect, the identification of the presence or absence of said mutation of the invention as defined above can also be done by means of a hybridisation reaction with a probe as defined above.

According to yet another aspect, the invention relates to a method for the diagnosis of disorders with a vascular component comprising the use of at least a nucleic acid sequence of the invention as defined above or a probe or primer as defined above.

According to another embodiment, the present invention relates to a kit for the diagnosis of disorders with a vascular component in a patient comprising at least a probe or primer according to the invention as defined above.

Said kit can be based upon a technique selected from the group consisting of in situ hybridisation, Northern blot hybridisation, Southern blot hybridisation, isotopic or non-isotopic labelling (by immunofluorescence or biotinylated probes), genetic amplification (especially by PCR or LCR), STS-PCR, countourclamped homogeneous electric field (CHEF) gel electrophoresis, restriction mapping, FISH analysis, mismatch cleavage, single strand conformation polymorphism (SSCP) or any other method known in the art, or a mixture thereof.

FIGURE LEGENDS

FIG. 1.

Pedigrees of 7 additional families with venous malformations with glomus cells. Blackened symbols indicate affected persons and unblackened symbols indicate unaffected persons. A question mark (?) indicates a person for which the affection status is not known and a slash symbol (/) indicates a deceased person.

FIG. 2.

Schematic representation of the YAC map and STS localization. Genes are marked in bold italic and SNPs in small underlined capital letters (WIAF).

A) *=YAC clone reported to be chimeric. Results for marker WI-6020 are marked with an I and reflect database entries only. ?=unclear results for marker D1S2849 for YAC 896b3.

B) Numbers under the markers correspond to those in WC1.14 contig from Whitehead/MIT database. A=from (Allikmets et al. 1997). S=placed during the STSs localization. Order for markers #36 to #39 is inverted to reflect the order in the PAC-map (FIG. 2).

C) Boxes represent the areas of localization for the mapped or novel STSs. Vertical lines delimit the VMGLOM locus (unbroken lines) or the smaller, haplotype-shared area (dashed lines).

FIG. 3.

Schematic representation of the PAC based STS and transcript map of VMGLOM. Gene names are in bold italic and underlined. Polymorphic markers are in bold.

Novel CA-repeats are in bold, italic. Markers for which the order was impossible to define with the clones used are represented by gray boxes. The best annealing temperature for PCR is given for each STS. PAC clones forming the original four islands marked with bold lines. PAC with names in bold were used for fingerprinting. Underlined clones are selected for sequencing by The Sanger Center. X=positive PCR result, −=negative PCR result,

=new end-of-clone STS.

FIG. 4.

Picture of an agarose electrophoresis result for Hind III fingerprinting of the selected PAC clones. 1 kb=1 Kb DNA Ladder (Gibco BRL); the smallest marker band on the picture is 1018 bp.

FIG. 5

cDNA sequence for the human VMGLOM “long form” (SEQ ID NO 1).

FIG. 6

Predicted amino acid sequence for the human VMGLOM “long form” (SEQ ID NO 2).

FIG. 7

cDNA sequence for the human VMGLOM “short form” (SEQ ID NO 3).

FIG. 8

Predicted amino acid sequence for the human VMGLOM “short form” (SEQ ID NO 4).

FIG. 9

Alignment of the cDNA sequences of human VMGLOM “short form” (SEQ ID NO 3), “long form” (SEQ ID NO 1) and FAP-48 (U73704).

FIG. 10

Alignment of the predicted amino acid sequences of human VMGLOM “long form” (SEQ ID NO 2), FAP-48 (U73704) and VMGLOM “short form” (SEQ ID NO 4).

FIG. 11

cDNA sequence for the mouse VMGLOM “long form” (SEQ ID NO 5).

FIG. 12

Predicted amino acid sequence for the mouse VMGLOM “long form” (SEQ ID NO 6).

FIG. 13

cDNA sequence for the mouse VMGLOM “short form” (SEQ ID NO 7).

FIG. 14

Predicted amino acid sequence for the mouse VMGLOM “short form” (SEQ ID NO 8).

FIG. 15

Pedigrees of 7 additional families with venous malformations with glomus cells. Black symbols are affected patients. Individuals with numbers were tested. *, no clinical examination.

FIG. 16

Glomulin gene structure and mutations. Size of exons and three largest introns are shown. Other introns are on scale. Top, mutations that cause an immediate stop codon or *, single amino acid deletion. Bottom, frame-shift mutations leading to premature stop codons. Line below represents the exons encoding FAP48. The open reading frame of “glomulin” is roughly 30% longer than that of FAP48. This was identified due to both an additional 85 by exon and an extra G in the gene encoding FAP48 (indicated by arrowheads in this figure). Both changes modify the open reading frame of “glomulin” resulting in a protein of 594 aa instead of 417 aa for FAP48.

FIG. 17

Glomulin northern blot analysis. Human multiple northern blot (Clontech) hybridized with a 482 by 5′-probe of glomulin. Similar results obtained with full-length glomulin probe. This figure shows that glomulin is expressed in a large variety of tissues, and not only in the skin and subcutaneous tissue in which glomuvenous malformations are encountered.

FIG. 18

Upper chromatogram, control sequence; lower profile, mutant sequence; arrows, sites of mutation; *, reverse strand chromatogram. Δ, deletion; ins, insertion; >, substitution. Insets show segregation of the mutation by size difference (1,4,7-9,11), restriction enzyme digestion (2,3,6,13), heteroduplex analysis (5,10) or allele-specific PCR (12). C, control; ND, non-digested.

FIG. 19

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 142), exons 1 and 2 (underlined) and intron 1 and partially intron 2.

FIG. 20

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 143), exons 3, 4 and 5 (underlined) and introns 3 and 4, and partially introns 2 and 5.

FIG. 21

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 144), exon 6 (underlined) and partially introns 5 and 6.

FIG. 22

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 145), exon 7 (underlined) and partially introns 6 and 7.

FIG. 23

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 146), exon 8 (underlined) and partially introns 7 and 8.

FIG. 24

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 147), exons 9, 10, 11 and 12 (underlined) and introns 9, 10, 11 and partially introns 8 and 12.

FIG. 25

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 148), exons 13, 14 and 15 (underlined) and introns 13, 14 and partially introns 12 and 15.

FIG. 26

Human glomulin (VMGLOM) genomic sequence (SEQ ID NO 149), exons 16, 17 and 18 (underlined) and introns 16 and 17 and partially introns 15 and 18.

FIG. 27

Human genomic sequence: partial promoter, exon −1, intron −1, exon 1 and the beginning of intron 1. The exons are underlined (SEQ ID NO 150).

FIG. 28

Murine genomic sequence: partial promoter, exon −1, intron −1, exon 1, intron 1, exon 2 and partial intron 2. The exons are underlined (SEQ ID NO 151).

FIG. 29

Murine genomic sequence: exon 3, intron 3, exon 4, intron 4, exon exon 5, intron 5, exon 6, intron 6 and exon 7. The exons are underlined (SEQ ID NO: 152).

FIG. 30

Human multiple tissue expression (MTE) dot blot (Clontech) hybridised with the full-length (1850 bp) glomulin cDNA probe (amplified from cloned fragment with primer 1: TCT GGC CGA TTT TAG CAT CG (SEQ ID NO 9) and primer 27: TAG TTT TTA TTT AGG AAA TGG AAC (SEQ ID NO 10). All tissues show a positive hybridisation signal.

FIG. 31

Glomulin RT-PCR results on several human tissues. Multiple tissue RT-PCR using primers specific to a region of approximately 500 by at the 5′ (A) or 3′ (B) end of the glomulin gene and covering multiple exon-intron boundaries. Control RT-PCR using primers specific to glyceraldehyde phosphate dehydrogenase, and glucose-6-phosphate dehydrogenase, demonstrated equal concentration of cDNA for every sample (results not shown). DNA size standards are indicated to the left. Lanes in A and B: 1, Artery cDNA; 2, Aorta cDNA; 3, Placenta cDNA; 4, Skeletal muscle cDNA; 5, Skin cDNA; 6, Smooth muscle cell cDNA; 7, Umbilical cord cDNA; 8, Umbilical vein cDNA; 9 Vena Cava cDNA. Lane 10: plasmid containing glomulin insert (positive control), lane 11: water (negative control). (C) and (D): Lanes: 1, Artery cDNA; 2, Heart cDNA; 3, Placenta cDNA; 4, Skeletal muscle cDNA; 5, Umbilical cord 1 cDNA; 6, Umbilical cord 2 cDNA; 7, Umbilical vein cDNA. VA's Lane 8, Glomuvenous malformation cDNA extracted from a patient with known 5 by mutation in the glomulin gene leading to a premature stop codon. Lane 9: Kaposiform hemangioendothelioma (KHE) cDNA. Lane 10: Venous malformation with know mutation in Tie 2/Tek gene. Lane 11: plasmid containing glomulin insert (positive control), lane 12: water (negative control). DNA size standards are indicated to the left. Control RT-PCR using primers specific to glyceraldehyde phosphate dehydrogenase, and glucose-6-phosphate dehydrogenase demonstrated equal concentration of cDNA for every sample (results not shown).

FIG. 32

Glomulin amino acid sequence (SEQ ID NO 153) showing in bold the sequences of the two synthesized peptides (207:SEQ ID NO 138; and 208: SEQ ID NO 139) used for polyclonal antisera production.

FIG. 33

ELISA results in triplicate for antiserum 455 against peptide 208 for preimmune serum, as well as for 1^(st) and 2^(nd) test samples at 43 and 71 days of the injection program.

FIG. 34

Western dot blot showing hybridisation immunoreactivity of the antiserum 452 and 453, 454 and 455 in concentration 1:500 against the synthetized peptides 207 and 208 in between 1-100 mg (A). In B: higher dilutions of 455 was tested.

FIG. 35

Prokaryotic glomulin expression constructs. pET-15b introduces a Histidine-tag to the aminoterminus of the glomulin, pET-3a encodes the wild-type, non-tagged, glomulin.

FIG. 36

Knock-out constructs. A) The lacZ knock-out construct leading to lacZ transcription under the control of endogenous glomulin promoter after homologous recombination in ES cells; B) The conditional knock-out construct creates a glomulin allele that can be made deficient at a given tissue or time point by the introduction of Cre-recombinase.

FIG. 37

Glomulin RT-PCR results on four murine embryos: From Left to right: embryonic day (E) 10, 14, 16 and 18. Lane 5, plasmid DNA containing the 5′ end of the glomulin gene as insert. On the right of the diagram Low Range marker (Fermentas) is shown with the alongside corresponding DNA sizes in base pairs.

FIG. 38

Western blot using glomulin anti-peptide antibody 452 shows binding to a 67 kDa and roughly 100 kDa protein in lysates from expression constructs of pET15b-glomulin. Binding to a protein of 58 kDa is observed in human tissue. Lanes 1 & 2, pET-15b-glomulin (His-tagged) transformed BL21 bacterial lysate at 8 hours post-induction from supernatant and pellet fractions respectively. Lane 3, same bacterial strain as 1 & 2 at time 0. Lane 4, lysate from hek293T cells. Lane 5, protein extract from vena cava tissue. Lane 6, Nickel column purified His-tagged glomulin. Lanes 7 & 8, pET-15b transformed BL21 bacterial strains following IPTG induction of an unrelated gene. Protein standards are indicated to the left of the diagram in kDa.

FIG. 39

Western blot using glomulin antiserum 453 shows binding to a 67 kDa and roughly 100 kDa protein to lysates from expression constructs. Lanes 1 & 2, pET-15b-glomulin (His-tagged) transformed BL21 bacterial lysate at 8 hours post-induction from supernatant and pellet fractions respectively. Lane 3, same bacterial strain as 1 & 2, at time 0. Lane 4 lysate from hek293T cells. Lane 5, protein extract from vena cava tissue. Lane 6, Nickel column purified His-tagged glomulin. Lanes 7 & 8, pET-15b transformed BL21 bacterial strains following IPTG induction of an unrelated gene. Protein standards are indicated to the left of the diagram in kDa.

FIG. 40

A) Western blot using anti-glomulin antiserum 455 shows specific binding to a 67 kDa protein in the supernatant of bacterial cell lysates. Lanes 1-9: pET-15b-glomulin transformed BL21 bacterial lysates at 0-8 hours after IPTG induction. Lanes 10 and 11: lysates from uninduced lysates at 5 and 7 hours respectively. Lane 12: control lysate from BL21 bacteria transformed with pET-15b—human glucokinase regulatory protein expression construct (used as a negative control). Protein standards are indicated to the left of the diagram in kDa. Each lane contains 27 μg of total protein as determined by the BCA-200 assay (Pierce). B) Western blot using anti-histidine tag antibody (Amersham-Pharmacia) shows specific binding to a 67 kDa protein in the supernatant of bacterial cell lysates form BL21 bacteria transformed with pET-15b-glomulin construct. Lanes 1-9: pET-transformed BL21 bacterial lysates at 0-8 hours after IPTG induction. Lanes 10 and 11: lysates from uninduced lysates at 5 and 7 hours respectively. Lane 12: control lysate from BL21 bacteria transformed with pET-15b—human glucokinase regulatory protein expression construct. Protein standards are indicated to the left of the diagram in kDa. Each lane contains 54 μg of total protein as determined by the BCA-200 assay (Pierce).

FIG. 41

Western blots from bacterial cell lysates expressing glomulin as a transgene, and tissues in which endogenous glomulin is present. A) Note the difference in size between glomulin expressed in pET-3a (lane 3), purified glomulin from pET-15b—histidine-tagged (lane 1), and endogenous glomulin from heart tissue (lane 2). B) Various human tissues in which protein was extracted and glomulin expression assessed. Lanes 1-6: vena cava, umbilical cord, placenta, heart, aorta, and umbilical vein. Equal loading of all samples was confirmed by the Pierce BCA-200 protein absorbance assay. Protein standards are marked in kDa to the left of the two figures.

FIG. 42

Western blots from tissues taken from an autopsy from a normal individual. Fifty micrograms of total protein was loaded in each lane unless otherwise stated. Protein size standards are indicated to the left in kDa. Tissues from A are: 1) aorta, 2) vena cava, 3) uninduced pET-15b-glomulin lysate, 4) 7 μg column-purified glomulin, 5) protein isolated from the lesion of a patient with Maffucci syndrome, 6) renal artery, 7) atrium, 8) splenic artery, 9) pulmonary artery, 10) sub-clavial artery, and 11) primitive carotid artery. Tissues in B are: 1) vena cava, 2) 7 μg column purified glomulin, 3) protein isolated from a lesion of a patient with Maffucci syndrome, 4) skin, 5) liver, 6) testicle, 7) left ventricle, 8) right ventricle, 9) supra-renal vena cava, 10) portal vein, 11) inferior vena cava.

TABLE 1A+B

Haplotypes A and B sharing in VMGLOM. Numbers indicate sizes of alleles that segregate with the disease in each family. At the top, symbol and geographic origin of family. USA:The United States of America; Bel:Belgium; Sco:Scotland; It&Italy; Fra:France; Ger:Germany; Yug:Yugoslavia. +:tetranucleotide repeat microsatellite. R: a recombinant individual in the family for this marker. X/Y: data not informative for linked allele. Alleles with a probable ancestral mutation differ from shared haplotype: white background; n/N: number of the shared allele on total number of alleles; fam: families linked to VMGLOM; con: control individuals from the Belgian population. P: P-value for the uncorrected chi-square test in a 2×2 table; *:significant P-value (p<0.01).

Table 2

Sixty-four control haplotypes, deduced from 16 father-mother-child triplets. Alleles of haplotype A have been boxed and shaded in gray. F: inferred haplotypes of father. M: inferred haplotypes of mother. T: haplotype transmitted to child. NT: haplotype not transmitted to child.

Table 3

Sixty-four control haplotypes, deduced from 16 father-mother-child triplets. Alleles of haplotype B have been boxed and shaded in gray. F: inferred haplotypes of father. M: inferred haplotypes of mother. T: haplotype transmitted to child. NT: haplotype not transmitted to child.

Table 4

Primer sequences for the 22 new end-of-clone STSs with fragment sizes in base pairs.

Table 5

Primer sequences for nine novel CA-repeats with number of heterozygotes identified in 16 controls.

Table 6

Primer sequences for 18 intronic primer pairs enabling the amplification of 18 exons of the human VMGLOM gene.

Table 7

Exon-intron structure of the human VMGLOM gene with exonic and intronic sizes.

Table 8

Identified mutations in the VMGLOM gene. The numbering of said mutations refers to the nucleotide numbering as used in FIG. 5, where +1 is the A of the ATG codon at positions 39 to 41.

EXAMPLES Materials and Methods

Families

In addition to the families already described in Boon et al., 1999, 7 new families were identified. After informed consent, a clinical history was taken and physical examination was performed on all family members participating in the study. Venous blood samples were drawn for extraction of DNA. An additional sample was drawn for lymphocytic transformation from individuals Bln12, Bln100, Bln102, Sch12, Sch100, Sch102, Del101, Ad3, Ad12, Lml145, Lml181 and Lm1183. The pedigrees are shown in FIG. 1.

Linkage Analysis

Genomic DNA was extracted from the buffy coat (QIAGEN DNA extraction kit). Genotyping of individuals was performed as described elsewhere (Boon et al., 1994). All microsatellite markers located in the VMGLOM region on the basis of various databases (CEPH, CHLC, MIT/Whitehead; see the electronic database information below) were used. In addition to published polymorphic markers in the region, additional CA repeat microsatellites were isolated as part of the construction of a physical map of the region (33CA1, 50CA1, 69CA1 and 75CA1, Brouillard et al., unpublished). Linkage calculations were performed using the MLINK program of the LINKAGE package (Lathrop, 1984). The parameters were for an autosomal dominant disease with 90%, 80% and 70% penetrance, for individuals older than 16 years, between 10 and 16 years, and younger than 10 years, respectively, a disease allele frequency of 10⁻⁵ and 10 marker alleles with egal frequencies 0.1. The LOD scores were calculated for a recombination fraction 0 equal to 0.01, 0.05, 0.1, 0.2, 0.3 or 0.4.

Haplotype Sharing

In order to detect haplotype sharing, 3 affected individuals from each family were genotyped for every available microsatellite marker in the VMGLOM locus, exept for families Ba and Al, where only 2 affected individuals are present. The radioactive PCR products for each marker were resolved on a separate polyacrylamide gel to allow a consistent scoring of the alleles across the families. The slowest allele was assigned number 1, with allele numbers increasing with mobility in the gel. Unscorable alleles were assigned number 0 (see Table 1).

To assess the degree of linkage disequilibrium of the shared haplotypes in the affected families, the frequencies of these haplotypes in the general population were estimated (see Table 2 and 3). 16 triplets (father-mother-child) belonging to the genetically heterogeneous belgian population were genotyped for the markers. Two triplets of affected individuals from the families were included as controls to provide an internal reference for the size of the alleles. Each marker was resolved on a separate gel. Haplotypes were constructed by eye based on the inheritance of parental alleles to child, assuming no recombination. When a marker was uninformative because the father, the mother and the child have the same genotype, we always tried to maximize the occurrence of the shared haplotypes.

YAC Clones

Yeast strains containing YAC clones were ordered from Research Genetics or from C.E.P.H. (clones 736E1, 751F11, 848E3, 898E4, 917B5 and 948C3). They were grown in YPD (yeast extract/peptone/D-glucose) media and DNA was extracted according to the Current protocols in molecular biology (Unit 6.10.2).

STS Markers

STSs markers were selected on the basis of their localization in databases (Sanger, Unigene, Science maps '96, '98 & '99) close to VMGLOM (a205xD5-D1S2775). Primers for the STSs were synthesized by Gibco BRL on the basis of the sequence information from various databases (Genbank, GDB, dbSNP). Novel STSs were created from our PAC-end sequences (Table 4) and from the sequences of the (GT)16-positive clones (Table 5). All markers were amplified by PCR in 10 μl reaction volume using 10 ng of DNA. The amplification conditions were: [95° C., 3′; (95° C., 30″; 55-65° C., 30″; 72° C., 30″)×35; 72° C., 10′] using 0.25 units of the Biotools DNA polymerase™ (Labsystems).

PAC Clones

E. coli strains containing PAC clones were provided by The Sanger Centre (UK) except for the clones 103d10, 10406 and 226k2 that were ordered from HGMP. Colonies were isolated on LB-agar plates containing 30 μg/ml kanamycin (ICN). DNA extractions from 1.5 ml overnight cultures were carried according to a protocol from BACPAC resources, except that, at the end, the DNA pellets were resuspended in 200 μl of 10 mM Tris-HCl, pH=7.5, containing 0.1 mM EDTA.

PAC End-sequencing

Selected PAC clones were purified with the QIAGEN Plasmid Midi Kit using the QIAGEN protocol with slight modifications. Briefly, 100 ml of an overnight bacterial culture was divided into two tubes and the bacteria were pelleted by centrifugation. For each tube, 10 ml of P1, P2 and P3 were used. After the two steps of centrifugation, the supernatants were pooled and applied to the columns. Elution was done with 5 times 1 ml of QF buffer, pre-warmed to 65° C. DNA was precipitated as mentioned in the kit and resuspended into 200 μl of 10 mM Tris-HCl, pH=7.5, containing 0.1 mM EDTA.

Sequencing reactions were done using the Thermo SEQUENASE kit RPN2538 (Amersham). A 24 μl pre-mix containing 2 μg of purified PAC DNA and 3 μmol of IRD-800 fluorescent primer was divided into four tubes, each containing 2 μl of the appropriate nucleotide mix. The SP6 and T7 primers were synthesized by MWG Biotech. Cycle-sequencing program used was: [95° C., 5′; (95°, 30″; 54° C. for primer T7 (5′-TAA TAC GAC TCA CTA TAG GG-3′) (SEQ ID NO 111) or at 50° C. for primer SP6 (5′-CAT TTA GGT GAC ACT ATA G-3′) (SEQ ID NO 112), 30″; 70° C., 1′)×50]. 5 μl of the loading buffer were added and samples were denatured 5 min before separation on a 66 cm 4% Long Ranger gel with the DNA4000L™ sequencer (LI-COR).

PAC-library Screening

To identify new PAC clones, the Human RPCI-1 PAC library filters (loannou and de Jong 1996), provided by HGMP, were screened by Southern blot hybridizations using PCR-amplified end-of-clone STSs as probes (Table 1). PCR products were purified with QIAQUICK PCR purification kit™ (QIAGEN) prior to radiolabelling with [α³²P]-dCTP (Amersham). Hybridizations were performed as previously described (Boon et al. 1999).

Isolation of Novel CA-repeats

Isolation of novel CA-repeats from genomic DNA clones was performed as described (Klockars et al. 1996; Paavola et al. 1999). Briefly, 350 ng of PAC DNA was digested with Sau3A I, ligated to BamH I-digested pBLSK+, transformed in XL1-blue cells and plated on LB media containing 100 μg/ml ampicilin. The colonies were transferred on HYBOND-N membranes (Amersham) according to the manufacturer's protocol. A (GT)₁₆ oligonucleotide, synthesized by Gibco BRL was end-labeled with [α³²P]-ATP (Amersham) and hybridization was carried out as described (Boon et al. 1999). Positive colonies were picked and plasmid DNA was isolated with the QUANTUM PREP plasmid miniprep kit (Bio-Rad). The clones were sequenced with the M13 forward and reverse primers using the CEQ DTCS kit (Beckman), and an 8-capillary CEQ2000 sequencer (Beckman). The degree of polymorphism for the novel markers was tested by genotyping 16 unrelated control individuals as previously described (Boon et al. 1994).

Fingerprinting

44 μl of the mini-prep DNA extractions of the selected clones were digested with 18 units of Hind III in a final volume of 50 μl, for 2 hours. Digests were loaded on a 0.9% agarose gel (18 cm long), containing 0.8 μg/ml ethidium bromide. The gels were run at 70V (2.4 V/cm) for 15-20 hr. Pictures taken were manually analyzed.

Cloning of the VMGLOM Genes

The PAC end sequence 33SP6 (unpublished) identified ESTs homologous to FAP48, as well as the published FAP48 cDNA, in nBLAST searches. To clone the gene, primers were synthesized from the beginning and the end of the published FAP48 sequence (primers: VMGLOM-1: 5′-TCTGGCCGATTTTAGCATCG-3′ (SEQ ID NO 113) and VMGLOM-27: 5′-TAGTTTTTATTTAGGAAATGGAAC-3′ (SEQ ID NO 114)). Using total RNA extracted from EBV-transformed lymphoblasts the gene was amplified and cloned into pBLSK⁺ vector by T/A cloning. For this, pBLSK⁺ vector was digested with EcoRV and thymidines were added with Tth DNA polymerase (Labsystems). Inserts were sequenced through using vector primers (M13 F and R), using the Thermo SEQUENASE kit RPN2538™ (Amersham,). A 24 μl pre-mix containing 2 μg of purified DNA and 3 μmol of IRD-800 fluorescent primer was divided into four tubes, each containing 2 μl of the appropriate nucleotide mix. Cycle-sequencing program used was: [95° C., 5′; (95°, 30″; 55° C., 30″; 70° C., 1′)×35]. 5 μl of the loading buffer were added and samples were denatured 5 min before separation on a 66 cm 4% LONG RANGER gel (FMC BioProducts, Rockland, Me.) with the DNA4000L sequencer (LI-COR).

The obtained sequences were compiled to obtain full-length sequences (FIGS. 5 and 7), which were compared to the published FAP48 sequence (FIG. 9). The corresponding predicted amino acid sequences are 594 and 98 residues long (FIGS. 6 and 8).

To clone the human gene (including the introns), several exonic primer pairs were synthesized and used for PCR with genomic DNA as template. Gradually all introns were amplified. The ends of these amplified fragments were sequenced either directly or after cloning the PCR products. 18 separate exons were identified (Table 6) and the intronic sizes could be estimated (Table 7). Further below the sequence of the exon/intron boundaries is given (See Further determination of genomic structure of the VMGLOM genes)

To clone the mouse cDNA, the human VMGLOM cDNA sequence was aligned with identified mouse EST sequences. On the basis of these mouse ESTs, primers were selected from the 5′ end (before the putative ATG codon in the mouse sequences) and from the 3′ end (after the putative STOP codon in the mouse sequences): mVMGLOM-1,5′-AATGGCTGTGGAGGAACTTC-3′ (SEQ ID NO 11) and mVMGLOM-5,5′-GCATTTTGTTGGTTTTTATTTATG-3′ (SEQ ID NO 12). These primers were used to amplify the full-length murine cDNA, which was cloned to pBLSK⁺ vector by T/A cloning. As above, inserts were sequenced using vector primers M13 F and R on the DNA4000L sequencer (LI-COR). The obtained sequences were compiled to obtain full-length sequences (FIGS. 11 and 13). The corresponding predicted amino acid sequences are 573 and 98 residues long (FIGS. 12 and 14). A separate paragraph relating to the cloning of genomic fragments of the mouse glomulin gene is incorporated further below.

Identification of Mutations

Patient cDNA or DNA was amplified using exonic or intronic primer pairs. The size of the amplification products varied roughly between 200-350 by (Table 6). For single stranded conformation polymorphism (SSCP) and heteroduplex analysis, both PCR primers were end-labeled with α³²P using polynucleotide kinase (TAKARA), according to manufacturer's recommendations. The PCR reactions were divided into two aliquots before loading onto non-denaturing polyacrylamide gels (MDE gel solution, FMC). EDTA (final concentration 5 mM) and non-denaturing loading buffer (according to FMC) was added to the reactions for heteroduplex analysis, whereas a denaturing loading buffer (according to FMC) was added to the SSCP samples. After heat-denaturation, the samples for SSCP analysis were immediately loaded onto SSCP gels. The samples for heteroduplex analysis were first cooled from 95° C. to 37° C. at one centigrade per minute to increase the formation of heteroduplexes. Both gels were run for 14-16 hours, SSCP gels at constant power (6-8W), and heteroduplex gels at constant potential (700V). Gels were vacuum-dried and exposed for 12-24 hours to KODAK X-OMAT™ film. Fragments showing abnormal migration were reamplified, purified (Qiagen PCR columns), and cycle-sequenced using Beckman fluorescent dye-terminator technology (CEQ DTCS™ kit) and the Beckman CEQ 2000™ capillary sequencer.

Seven New Families with Glomuvenous Malformations

The inventors studied seven additional families (FIG. 15), one patient with familial history of the disorder (R1), and one sporadic case (BG). For genomic DNA extraction, buccal-cell brushes were obtained from individuals Blo-52 and Blo-810. Venous blood samples were drawn for others. A second blood sample was obtained from some individuals for lymphocytic transformation with Epstein-Barr virus. Immunohistochemistry was performed as described (Boon et al., 1999).

Northern Blots

Hybridizations of the Human Multiple Tissue Northern (MTN)® Blot were carried out according to the protocol for Human Multiple Tissue Expression (MTE)™ Dot Blot (Clontech Laboratories, CA, USA). Two different probes that were radiolabelled by random-priming with ³²α-dCTP were used: full-length glomulin coding sequence and a 482 by 5′-fragment (nt −23 to +459). The filters were exposed to Biomax films (Kodak) or analysed by phosphorimager (Molecular Dynamics). 5′ RACE, using gene-specific primers 5′-GCT GAT TCC AAA GGG TAG AC-3′ (SEQ ID NO 115), 5′-TGG GAT ATC TGT TTT CCA GAG-3′ (SEQ ID NO 116) and 5′-CTA TCC TCT TTA TCT TTA CAC-3′ (SEQ ID NO 117), was done with 5′RACE System for Rapid Amplification of cDNA Ends (Life Technologies).

Human Multiple Tissue Expression Dot Blot

Hybridizations of the Human Multiple Tissue Expression Dot Blot (MTE)™ (Clontech Laboratories, CA, USA) were carried out according to the protocol for Human Multiple Tissue Expression Dot Blot™ (Clontech Laboratories, CA, USA). The full-length coding sequence of human glomulin, radioactively labelled by γP³² and amplified using exonic primers Primer 1: TCT GGC CGA TTT TAG CAT CG (SEQ ID NO 118) and Primer 27: TAG TTT TTA TTT AGG AAA TGG AAC (SEQ ID NO 119), was used as a probe (FIG. 30). The analysis was done as for MTN® hybridisations.

Human Multiple Tissue RT-PCR Analysis

For testing glomulin expression in human tissues by RT-PCR, cDNAs were prepared by reverse-transcription using the SUPERSCRIPT™ kit according to the recommendations of the manufacturer (Gibco-BRL). Tissues tested included: an artery, aorta, heart, placenta, skeletal muscle, skin, cultured smooth muscle cells (a gift from Dr. B. Kraling, Heidelberg, Germany), umbilical cord, umbilical vein, vena cava, glomuvenous malformation resected from a patient with a known 5 by mutation in the glomulin gene, kaposiform hemangioendothelioma (KHE), and a venous malformation with as of yet no known mutation in TIE2/TEK gene. A plasmid containing glomulin cDNA was used as a positive control, and water as negative control. 5 μg of total RNA was used for cDNA synthesis. 1 μl out of the 20 μl reverse transcription product was used as template for PCR. Primer pairs “15”: GCA CAC AGA CCA GCT ATT AG (SEQ ID NO 120) and “8”: TCA AAG AAT TGT GCT GTC AGC (SEQ ID NO 121) from exons 2 and 6, and “25”: AGT TTA GCT ATG CTT CAG CTG (SEQ ID NO 122) and “19”: GGA GGC ATA TTA GGG ATC TC (SEQ ID NO 123) from exons 12 and 17 are specific to regions of 561 by and 503 bp, respectively, at the 5′ (FIGS. 31B and D) and 3′ (FIGS. 31A and C) ends of the glomulin gene. Both cover multiple exon-intron boundaries. PCRs were performed in standard conditions with cycling conditions: 95° C., 4′ for initial denaturation followed by 35 cycles of 95° C., 30″, 60° C., 30″, 72° C., 40″, followed by a 10′ final extension at 72° C. (FIG. 31). Control RT-PCR using primers specific to glyceraldehyde phosphate dehydrogenase (TTG GTA TCG TGG AAG TAC TCA (SEQ ID NO 124) and TGT CAT CAT ATT TGG CAG GTT T (SEQ ID NO 15)), and glucose-6-phosphate dehydrogenase (ATC GAC CAC TAC CTG GGC AA (SEQ ID NO 126) and TTC TGC ATC ACG TCC CGG A (SEQ ID NO 127)) were used as positive controls for all the cDNAs (results not shown).

Mouse Developmental Stage RT-PCR Analysis

cDNAs were prepared on total RNAs extracted from murine embryos of embryonic days (E) 10, 14, 16 and 18 (a gift from Dr. P Chomez, Ludwig Institute for Cancer Research, Brussels, Belgium). cDNAs were prepared by reverse-transcription using the SUPERSCRIPT™ kit (Gibco-BRL) using 2 μg of total RNA extracted from total murine embryos. 1 μl out of 20 μl of the prepared cDNA was used as template for PCR using primers AAT GGC TGT GGA GGA ACT TC (SEQ ID NO 128) for the forward primer and CAT CGA ACA ACT GGA CCA AC (SEQ ID NO 129) for the reverse primer. The amplified DNA product was 196 base pairs in length and covered 2 exon-intron boundaries, from exon 1 to exon 3. PCRs were performed in standard conditions with cycling conditions as follows: 95° C., 4′ for initial denaturation followed by 37 cycles of 95° C., 30″, 60° C., 30″, 72° C., 40″, followed by a 10′ final extension at 72° C. Control RT-PCR using primers specific to glyceraldehyde phosphate dehydrogenase (TTG GTA TCG TGG AAG TAC TCA (SEQ ID NO 130) and TGT CAT CAT ATT TGG CAG GTT T (SEQ ID NO 131)) were used as positive control for all cDNAs (results not shown).

Further Determination of the Genomic Structure of the VMGLOM Gene

Exon/intron boundaries for exon 3 were identified by sequencing the SP6-end of the PAC clone 775d17 (Brouillard et al., 2000). To define the remainder of the genomic structure, 36 primers were designed based on the glomulin cDNA sequence. Different combinations of these exonic primers were used for PCR on PAC clones 775d15 and 1090n11. Inter-exonic fragments obtained were partially sequenced to identify exon/intron boundaries (FIG. 16), using a DNA4000 (Li-Cor) or a CEQ2000 (Beckman) fluorescent sequencer.

Cloning of the VMGLOM Genes

To further study the structure of the 5′ end of the glomulin gene, the glomulin cDNA sequences obtained from 5′ RACE experiments and the 3′ sequences obtained during the cloning of the full-length glomulin cDNA, were used to screen against public sequence databases, especially dbEST and Unigene, and the human genome draft sequence database, to see whether the exon-intron structure of the glomulin gene was complete regarding the ends of the glomulin cDNA sequences. Part of the 5′ cDNA sequence (the 8 first nucleotides in FIG. 5) was not covered by the genomic sequences of the investigators, but was identified in a PAC sequence in the human draft sequences. This sequence is located 894 by upstream of exon 1, thus creating a new intron and exon (a 19^(th) exon named exon −1). The sequence of exon—1 and the surrounding intronic and promoter sequences are given in FIG. 27. Primers TAC CTG CGG CTT TTC GAG AG (SEQ ID NO 132) and ACC CTG AAC CTC TCC ACA AC (SEQ ID NO 133) were synthesized allowing the amplification of this exon for mutational screening using genomic DNA as template, as described for other exons. In addition, a new intronic forward primer CTT AAG TGT AAT ATC ACG GAT AG (SEQ ID NO 134), was synthesized for exon 1 genomic amplification, and replaced the forward primer in Table 6.

Cloning of Genomic Fragments of the Mouse Glomulin Gene

To allow the construction of glomulin null-alleles, which would be introduced into murine embryonic stem cells (ES cells) by homologous recombination, large fragments of the murine glomulin gene were cloned and sequenced. To do this, several exonic primer pairs were synthesized and used for PCR, with murine genomic DNA from a female of the strain 12956/ScEvTac as template. Gradually all introns were amplified between exons 1 and 7. The ends of these amplification products were sequenced either directly or after cloning the PCR products into the pBLSK+ (Stratagene) vector. To get the full-length sequences of the introns, except for intron 2, which is about 10 kbp, additional primers were synthesized on the basis of the already obtained intronic sequences, and thus by “genomic walking” the complete sequences were obtained (FIGS. 28 and 29).

The 5′ end of intron 1 and the sequences upstream of murine exon 1, were obtained by subcloning and sequencing a murine PAC clone, known to contain exon 1. Briefly, PAC clone 587o16 was digested with BamHI and the fragments were ligated into pBLSK+ (Stratagene). The products were tranformed into E. coli and the bacteria were plated to obtain isolated colonies. These ‘libraries’ were transferred onto nylon membranes that were hybridized with a probe corresponding to murine exon 1 and beginning of intron 1. One subclone containing a BamHIH/BamHI insert of about 8 kbp was identified. This clone contained sequences up to 6 kbp upstream of exon 1. Ends of this clone were sequenced with universal F and R primers as well as with a reverse primer of exon 1. To speed up the sequencing, the 8 kbp insert was further subcloned using EcoRV, PstI, PvuII and Sau3aI restriction enzyme cutting sites. Several of these subclones were sequenced, and this shotgun sequencing provided pieces covering altogether about 4 kbp. New primers were designed at the ends of these pieces of sequences and by ‘walking’, the gaps were closed. The ordered consensus sequences are shown in FIG. 28.

Mutational Screening on Genomic DNA

34 additional intronic primers were synthesized from the obtained genomic sequences to amplify the 18 exons of the human glomulin gene. Genomic DNA was screened by radioactive SSCP and Heteroduplex analysis for Ad-3, Al-14, Ba-10, BG, Chn-200, Del-2, Du-10, Ft-21, Ke-10, Ly-100, R1, Wi-14 and several control individuals, as described (Boon et al., 1999). Amplified fragments were also loaded on denaturing 5% acrylamide sequencing gels to identify potential insertions or deletions. Fragments presenting abnormal migration were re-amplified, purified, and sequenced on a CEQ2000 capillary sequencer (Beckman) (FIG. 18). Furthermore, the novel exon-1 was screened for additional mutation, as described below.

Mutational Screening of the Novel Exon −1

With the forward and reverse primer pair (a fragment with the size of 254 bp), exon −1 can be amplified by PCR using genomic DNA extracted e.g. from patients blood samples or from resected tissues, as template. With SSCP, heteroduplex analysis, sequencing gel size analysis, and sequencing, this fragment was screened in a set of new DNA samples from additional families with vascular phenotypes, glomuvenous malformations, venous malformations and blue rubber bleb nevus syndrome (BRBN), as well as from 2 glomuvenous malformation lesions of the same patient.

Co-segregation of Point Mutations

As most mutations create size differences, sequencing gels were used to assess inheritance in the families. Mutations 107insG, 554del4+556delCCT and 1711delGT were also checked by appropriate digestion (FIG. 18). To identify carriers of the 108C→A mutation (Ba family) that destroys an NsiI cutting site, exon 2 was amplified by PCR and digested with the enzyme. As the mutation 1547C→G (Ft family) does not change any restriction site, a wild-type and a mutant primer for allele-specific PCR (5′-CTG CTT CAT AAT GTG CTT TT(C/G)-3′) (SEQ ID NO 135) were synthesized. These were used in combination with the forward primer of exon 16 (5′-AGT AGG CAA TCA ATC ATT GTT G-3′) (SEQ ID NO 136). Annealing temperature was 58° C. A reverse primer of exon 16 (5′-AAT GGC TTA GCT GTT ATG GTC-3′) (SEQ ID NO 137) was added to the reaction to serve as an internal positive control and as competitor to improve the specificity of the reaction.

Polyclonal Antisera Against Human Glomulin Peptides

Peptides

On the basis of the deduced amino acid sequence of human glomulin, two 16 amino acid peptides were synthesized by Eurogentec (Seraing, Belgium); CVPYSKEQIQMDDYGL (SEQ ID NO 138) and CEIKTKSTSEENIGIK (SEQ ID NO 139) (called 207 and 208, respectively, FIG. 32). The peptides were coupled to a BSA carrier (Eurogentec, Seraing, Belgium). Each peptide was injected into two rabbits, following the antibody production program of Eurogentec, (Seraing, Belgium). It consists of immunisation of 4 rabbits, with booster injections given every 28 days over a 3 month (84 day) period. Negative serum controls were obtained before injections, and altogether three serum samples were obtained at 43, 71 and 100 days of the program. Final bleeds were obtained at 3.5 months after the beginning of the injections. Titers of the four polyclonal rabbit antisera (452, 453, 454 and 455) were determined by Elisa against the synthesised peptides. Results for 455 shown in FIG. 33 (Eurogentec, Seraing, Belgium).

Purification of IgG Fractions from Antisera

Aliquots of the antisera were purified using a Protein G Sepharose HITRAP® column (Amersham-Pharmacia). Briefly, the columns were washed with water and equilibrated with supplied binding buffer. Subsequently 5 mL of the serum sample was applied and the columns were washed with the supplied binding buffer until no material appeared in the effluent. The IgG fractions were eluted using the supplied elution buffer (Amersham-Pharmacia). Working dilutions for the antisera were determined by dot blot Western hybridisation. 1, 10 and 100 ng of the synthesised peptides were spotted on nitrosellulose membranes (Amersham) and antisera dilutions 1:500 (FIG. 34A) or 1:4500, 1:13500 and 1:27000 (FIG. 34B) were tested.

Western Blotting

Western blots were done according to the NOVEX WESTERN BREEZE™ protocol of the chemiluminescent Western blotting and immunodetection system (Invitrogen, Germany). Briefly, bacterial or tissue extracts were run in a 24 cm 10% denaturing SDS-PAGE gel at 60 V for 16 hours. Following SDS-PAGE the proteins were transferred to nitrosellulose membranes by electrophoresis at 150 mAmps for 2 hours.

The immunostainings were performed using 1:5000 dilution of the purified 455 antiserum or the antisera from 452 and 453. Nonspecific hybridization was blocked by incubating the nitrocellulose membranes for 30 min in a supplied concentrated buffered saline solution containing detergent and concentrated Hammersten casein solution (Invitrogen, Germany). Alkaline phosphatase-conjugated, affinity purified anti-rabbit IgG was used as the secondary antibody (Invitrogen, Germany). A ready-to-use supplied solution of CDP-START™ (Invitrogen, Germany) mixed with supplied NITRO-BLOCK-II™ (Tropix Inc.) was used as the chemiluminescent substrate for alkaline phosphatase (Invitrogen, Germany). Exposure were done on Kodak Biomax™ films (Amersham-Pharmacia) for 30″-10′.

COOMASSIE Stains

For COOMASSIE staining of the protein size standards, gels were incubated in COOMASSIE BRILLIANT BLUE™ 250R (Sigma, USA) for 45 minutes and washed 2 times in decolouring agent (13% alcohol, 13% methanol and 4% acetic acid) for 1 hour, followed by a third wash performed overnight. The following day, the gels were rinsed for a minimum of 3 hours in water.

Bacterial Expression Constructs, Expression, and Extractions

The full-length human glomulin cDNA was cloned in fragments into the multiple cloning site (MCS) of the high-copy plasmid pBLSK+ between the Sal I and Bgl II restriction sites (Stratagene, Belgium). The integrity of the sequences was confirmed by sequencing. The glomulin cDNA was then modified by PCR, using this clone as template, with specific primers to create 5′ Nde I (GGA GAA ATA CAT ATG GCT GTA G) (SEQ ID NO 140) and 3′ Bam HI (AAC CCT ATT TCA CTT TCA CCT AGG AC) (SEQ ID NO 141) restriction sites. The purified PCR product was ligated into the Eco RV blunt end restriction site in the MCS of pBLSK+ vector (Stratagene, Belgium). After sequencing the insert, to ensure that the open reading frame of glomulin was free of mutations, glomulin cDNA was excised using the introduced Nde I and Bam HI sites, and ligated into the Nde I and Bam HI sites in the MCS of the low-copy pET-3a and pET-15b expression vectors (Novagen, USA). These vectors have the advantage of having the start “ATG” codon directly in the Nde I restriction site, and contain upstream the T7 promoter site for transgene activation. Furthermore, pET-15b possesses a histidine-tag, located on the 5′ (N-terminal) end of the encoded protein (FIG. 35).

For expressing transgenic glomulin, E. coli strain BL21 transformed with pET-15b, containing recombinant glomulin, was plated on LB-agar (1% Tryptone, 0.5% Yeast extract, 1% NaCl, 1.5% Agar, pH 7.4) containing the antibiotics chloramphenicol (25 μg/ml) and ampicillin (100 μg/ml). Fresh colonies were selected and 20 mL of LB (1% Tryptone, 0.5% Yeast extract, 1% NaCl, pH 7.4) precultures were grown overnight. The following day, 100 mL of LB or M9 minimal salt medium (5×M9 salts [6.4% Na₂HPO₄, 1.5% KH₂PO₄, 0.25% NaCl, 0.5% NH_(4], 1)M MgSO₄, 20% glucose, 1M CaCl₂) containing chloramphenicol and ampicillin, was innoculated with 4 mL from the preculture. Preliminary expression experiments with this system demonstrated that LB growth medium produced bacteria expressing greater amounts of glomulin, which convinced the investigators to abandon the use of M9 growth medium for all subsequent experiments.

A plasmid miniprep (BioRad, USA), followed by Nde I/Bam HI double digestion and agarose gel electrophoresis, was performed on the precultures in order to ensure that the glomulin insert was still present. Positive cultures were grown at 37° C. for roughly 2 hours to obtain an absorbance at 600 nm of 0.5, at which point the cultures were cooled on ice for 20 minutes and separated into 2 flasks, one containing 15 mL and the other 85 mL. At this point, the 85 mL culture was induced with Isopropyl-β-D-thiogalactopyranoside (IPTG), and the 15 mL culture was used as an uninduced control. Both culture flasks were then returned to an incubator to grow. Various temperatures (37° C., 22° C., and 16° C.) were assayed, and it was observed that glomulin production was best at 22° C. Thus, 22° C. was the temperature focused upon for the remaining experiments.

Four (4) mL aliquots were taken at various time points (Eg. 0, 3, 5, 8 hours) to assess the expression level of the glomulin construct. Cells were pelleted and resuspended in lysing buffer (20 mM potassium phosphate pH7.4, 5 mM EDTA, 1 mM dithiothreitol, 1 mg/mL lysozyme, 2.5 μg/mL leupeptin, 2.5 μg/mL antipain, and 0.5 mM phenylmethylsulfonylfluoride PMSF). Cells were then lysed by freeze thawing, then 3 times, between liquid nitrogen and 37° C. heating block. Bacterial DNA was removed by DNase digestion for 1 hour at 4C (5 μg/mL DNase with 0.1M MgSO₄). Cell debris and inclusion bodies were then removed by centrifugation at 13,000 g for 30 minutes at 4° C. in an Eppendorf microcentrifuge. Supernatant and pellet fractions were stored at −20° C. Protein levels were quantitated using the BCA-200™ kit from Pierce (Rockford, USA).

Eukaryotic Protein Extraction

Proteins were extracted from tissues frozen and stocked at −80° C. First, the chosen tissues were transferred to liquid nitrogen. Tissues were then crunched in a sub-zero metal cylinder with a mallet, and weighed out on a scale. Filter sterilized Camiolo extraction buffer pH 7.4 (0.0075M potassium acetate, 0.3M sodium chloride, 0.1M L-arginine basic salt, 0.01M EDTA and 0.25% Triton X-100) was added in the amount of 1 mL per 100 mg crunched tissue, and homogenized for 1 minute with a ULTRA-TURRAX™ T25 (Janke & Kunkel, Germany) tissue homogenizer. After being placed on ice for a minimum of 5 minutes, the homogenized tissue was spun at 3000 rpm for 15 minutes at 4° C. Supernatant and pellet fractions were then separated and protein levels quantitated using the BCA-200 kit from Pierce.

Determination of Protein Concentration

Protein concentration from prokaryotic and eukaryotic cell and tissue extracts were determined using the BCA-200 Protein Assay Kit from Pierce (Rockford, USA). Briefly, a fresh set of protein standards was made using BSA at concentrations of 2000, 1500, 1000, 750, 500, 250, 125, and 25 μg/mL. Next, 25 μL of each of the protein samples and BSA standards were mixed with 200 μL of the supplied BCA (bicinchoninic acid) working reagent on a microwell plate. The plate was covered and mixed on a vortex, and incubated at 37° C. for 30 minutes. At this point, a colorimetric reaction occurred, in which the copper in the working reagent was reduced from Cu⁺² to Cu⁺¹ by the proteins in the samples. This reaction occurs in a protein concentration dependent manner. Subsequently, the plate was cooled to room temperature and the absorbance at 562 nm was measured on a plate reader. A response curve for BSA was generated (net absorbance at 562 nm vs. protein concentration in μg/mL). The measured absorbance of the test samples was then plotted onto the response curve and unknown protein concentrations were determined.

Affinity Column Purification of Glomulin

Glomulin that was expressed in the pET-15b vector containing a histidine tag was column purified using HITRAP® affinity columns (Amersham-Pharmacia), owing to the histidine tag's affinity for metal ions. Briefly, as a washing step, 5 mL distilled water was let through the HITRAP® column dropwise using a syringe. The column was subsequently loaded with 0.5 mL of 0.1M NiSO₄ metal salt solution and washed with distilled water. The column was then equilibrated with 5 mL of start buffer (0.02M sodium phosphate, 0.5M NaCl, pH 7.4), and 5 mL of the sample was applied. The column was then re-washed with 5 mL of start buffer, before applying 2 mL of elution buffer (0.02M sodium phosphate, 0.5M NaCl, 0.5M imidazole, pH 7.4). This led to competitive elution of the histidine-tagged glomulin protein by imidazole, which has a higher affinity for the nickel ions than histidine. Alternatively, pH gradient purifications were performed with less success (results not shown).

Results

Families

The number of affected males in the 12 families is 35 and the number of affected females is 40. This is consistent with the data in Boon et al., 1999, showing no significant sex bias. 59% (26/44) of children from an affected person are also affected, a figure compatible with a dominant disease.

Linkage

The highest observed two-point LOD scores for the new families were 4.05 for marker D1S2804 (family Lml), 1.69 for marker D1S2776 (family Sch), 0.75 for marker D1S188 (family Bln), 0.56 for marker D1S188 (family Del), 0.56 for marker D1S2776 (Family Ad), 0.52 for marker D1S2776 (family Ba), and −0.18 for marker D1S188 (family Al), all at θ=0.0. For marker D1S188, the LOD scores at θ=0.0 for the families were 3.78 (family Lml), 1.28 (family Sch), 0.75 (family Bln), 0.56 (family Del), 0.32 (family Ba), −0.52 (family Ad) and −0.18 (family Al). The maximum combined LOD score for D1S188 for these seven families is thus 5.99, and, for all 12 families, 18.41 (θ=0.0).

The number of affected males and females in the 12 families is 35 and 40, respectively, and 59% (26/44) of children from an affected person are affected. Visual examination of the pedigrees reveals that the disease seems to skip a generation twice (individuals De15 and Bln104). However, individual Bln104 has not inherited the haplotype linked with the disease, suggesting that he is not a carrier and that his daughter Bln1040, with a single small ventral lesion, is a phenocopy. In contrast, Del5 is an unaffected person who has inherited the haplotype associated with the disease in his family, and she has an affected son. Thus, she is an obligatory carrier and the mutated gene has a reduced penetrance. Similarly, individuals Lml223, Sch1020, Al12 and Bln1070 were recombinant throughout the VMGLOM region and are likely to be unaffected carriers. This could be explained by their age: Lm1223 is 14 years old, A112 is 10 years old, Sch1020 is 2 years old, and Bln1070 is 1 year old. Thus, 5 unaffected carriers are observed among 43 individuals with the disease haplotype in these seven families. Combined with the data in the 5 initial families, where all 38 carriers of the disease haplotype were affected, a penetrance of ˜94% (76/81) can be calculated.

Haplotypic analysis of the seven families defined new obligatory recombination events within VMGLOM between markers AFMB337XE1 and D1S188 (affected individual Lm122) on the telomeric part of VMGLOM, and between markers D1S236 and D1S2779 (affected individuals Sch3 and Bln100, and unaffected individual Bln1020) on the centromeric part of the region. This reduces the locus by 2 cM from AFMA205XD5-D1S236 (Boon et al. 1999) to AFMB337XE1-D1S236, a region of about 3 cM.

Haplotype Sharing in VMGLOM

When the linked haplotypes of the 12 families were compared, two distinct haplotypes, haplotype A, shared by 7 families (BI, Bt, Sh, F, T, Bln, Sch), and haplotype B, shared by 4 families (Al, Ba, Del, Ad) were revealed. Haplotype A is shared from D1S2804 to D1S2849, and, in a subset of families, even more telomerically or centromerically (Table 1A). Haplotype B is shared between markers D1S2804 and D1S2868, and, again, telomeric and centromeric extensions are observed in a subset of the families (Table 1B). Family Lml presents a unique haplotype. Within the shared haplotypes, non-shared marker alleles were occasionally observed in some families for markers D1S2804, D1S424, D1S406, 69CA1, 50CA1 and 75CA1 (Table 1A+B).

Control haplotypes were constructed on the basis of the genotypes of 16 father-mother-child triplets, with the assumption that no crossovers have occurred between the markers. Within these haplotypes, the presence of haplotype A, haplotype B, and portions thereof, was looked for (Table 2). Haplotype A (from D1S2804 to D1S2849) was not seen in controls, although three haplotypes may be considered closely related to it (F1-NT, M2-NT, and F14-T). Haplotype B (from D1S2804 to D1S2868) was not seen in controls either, even if the alleles composing this haplotype seem more frequent than those of Haplotype A (Table 2).

Statistical significance of the apparent linkage disequilibrium was assessed using the chi-square independence test. The frequency bias is significant (P<0.01) for seven out of nine markers in the core of the first haplotype (between D1S2804 and D1S2849, Table 1A). This supports the hypothesis of a founder effect for this haplotype, and allowed to refine the locus further by inferring ancestral recombinations. In contrast, alleles of the second haplotype do not show statistically significant enrichment from the general population, and thus the second haplotype is probably due to co-occurrence of frequent alleles by chance (Table 1B). Thus, based on apparent ancestral crossovers in family T for the first haplotype, the VMGLOM locus can be delineated between marker_(—)33CA1 and marker D1S2779. These two makers, and all intervening markers, have been localised on the same non-chimeric 1.48 Mbp YAC 957D9 (Whitehead/MIT database). Naturally, the possiblility that the apparent crossovers in markers 33CA1 and D1S1170 in family T are actually the consequence of marker mutations cannot be ruled out, as such mutations were observed inside the core of the first haplotype for markers D1S424, D1S406, 50CA1 and 75CA1 (Table 1A). Taking this possibility into consideration, a very conservative analysis of the data delineates the locus between markers D1S188 and D1S2779.

YAC Physical Map

The positional cloning strategy was initiated by creating a YAC-based physical map on the basis of information collected from the Whitehead Institute/MIT database. Eighteen overlapping YAC clones were selected that cover the 5 Mbp area between the polymorphic markers AFMa205xD5 and D1S2775 that define the VMGLOM locus (Boon et al. 1999). The integrity of the clones was checked by PCR amplification of markers #24 to #49 from the contig WC1.14 of the Whitehead/MIT database (FIGS. 2A and 2B). These clones were used for the precise localization of additional STSs selected from various databanks (FIG. 2C) and created from our end-of-clones (FIG. 3 and Table4). We also identified the position of three polymorphic markers (D1S188, D1S406 and D1S1170) known to localize to this region (Allikmets et al. 1997). In contrast to Allikmets et al. (1997), marker WI-7719 could not be localised to our YAC-map and the order for markers D1S2849 to D1S286 as well as for D1S424 and D1S406 was inverted (FIG. 2). These results were later confirmed with the PAC-map (FIG. 3). For the integrity of the map, each YAC clone was tested for several markers assumed to be located outside the extremities of the clone. Although clones 934G7 and 944B12 are reported to be chimeras, no gaps were found with the marker set used. However, YAC 784H3, also reported to be a chimera, shows at least two gaps (FIG. 2A). Based on the known sizes of the YAC clones, the size of the VMGLOM locus was estimated to be approximately 5 Mbp (751F11, 946C5, 957D9 and 943H8 cover altogether 6.14 Mbp with overlapping parts).

YAC-based STS and Transcript Map

Having previously excluded as the mutated gene three known genes in VMGLOM (Boon et al. 1999), new positional candidate genes needed to be indentified. Therefore, more than 80 STSs were selected from several databases (Sanger, Unigene, Science maps '96, '98 & '99) on the basis of their localization by radiation hybrid mapping to the vicinity of the VMGLOM locus. Every marker was first amplified by PCR on six overlapping YACs covering the whole region (736E1, 751F11, 946C5, 957D9, 944B12 and 759D7, FIG. 1A), with a genomic DNA as positive control. 48 positive markers were identified. Finer localization of these 48 markers was performed by testing all the YAC clones in the vicinity of the positive ones. Each negative result allowed the exclusion of the area covered by the corresponding clone. Using this strategy, five markers, WI-13478, D152779, G32495FS, G31522 and WI-15861 were precisely localized inbetween existing markers of the YAC-map (FIG. 2B), whereas the 43 other STSs were only roughly localized (FIG. 2C). Six of the STSs correspond to SNPs (WIAF-1748, WIAF-1230, WIAF-1547, WIAF-1393, WIAF-1842 and WIAF-1642). In addition, to identify novel genes in the region, a homology search was done for each marker by Blast analysis and several genes were retrieved: EVI5, breast cancer anti-estrogen resistance 3 (BCAR3), PTPL1-associated RhoGAP (PARG1), peroxisomal 70 kD membrane protein (PXMP1), KIAA0231, RAD2 and Acidic Calponin (FIGS. 2B and 2C).

The identification of haplotype sharing in VMGLOM among 12 families having reduced the candidate region from AFMa205xD5-D1S2775 to D1S1170-D1S2779, the resolution of our YAC map became too low for precise localisation of candidate genes and polymorphic markers. Based on the size of the YAC 957D9 containing both D1S1170 and D1S2779, and thus the whole region showing haplotype sharing, the VMGLOM locus should be less than 1.48 Mbp (FIG. 2). We undertook the creation of a more precise physical map of this locus, using PAC clones.

PAC Map

The Sanger Center, as part of the Human Genome Project, is sequencing the human chromosome 1, and thus, they have already identified several PAC clones from this human chromosome. To create a map, their database was first searched for PAC clones with the STSs in the VMGLOM haplotype-shared area. This way, twenty clones were found possibly localizing to VMGLOM. Each clone was tested by PCR for all the markers in the VMGLOM YAC map between D1S1170 and D1S2779 (FIG. 3). A manual analysis of the results allowed the clustering of the PACs in four contigs (FIG. 3). With a second search in the Sanger database, we picked twenty-three additional PACs. None closed the gaps between the PAC clusters. To join the different PAC-islands, altogether 21 new STSs were generated from the sequences obtained by direct sequencing of the ends of the protruding PAC clones (Table 4). Marker 33SP6, from the centromeric end of clone 775d17, closed the first gap, being positive for the PAC 1090n11 (FIG. 3). Similarly, marker 21SP6 enabled to bridge clone 981e3 with clone 606m5. However, the novel markers 47SP6 and 17T7 inside the last gap, did not reach any clone from the other cluster. Thus, a PAC library screening, using the amplified 17T7 as probe was performed. This resulted in the identification of two new PACs: 104o6 and 226k2. These clones bridged the two contigs, what was also confirmed with three new STSs generated from the ends of these clones (70SP6, 70T7 and 75SP6, Table 1). To obtain double coverage for the single-linked point in the map around marker 21SP6, new clones were screened for from the PAC library with 21SP6. Clone 103d10, which overlaps with clones 606 m5 and 981e3, was identified. This overlap was confirmed with the novel STS 69SP6. All other novel STSs created were located inside the contigs (FIG. 3).

Novel CA-repeats

To identify new polymorphic markers for linkage and haplotypic analyses, nine PAC clones were selected for CA-repeat screening. Seven of these were not positive for a known CA-repeat (PACs 976013, 606 m5, 775d17, 828k3, 617o13, 103d10 and 226k2, FIG. 3) and two (612c19 and 981e3) contained one (D1S2776 or D1S2779, respectively). These PACs were subcloned and the libraries were screened by hybridization with a radiolabelled (GT)16 probe. More than forty positive subclones were sequenced. This enabled the identification of nine different CA-repeats (Table 5). The sub-library from clone 828k3 did not show any clone containing a putative repeat and the eight positive ones from clone 981e3 only revealed the known D1S2776. Three out of 12 from PAC 612c19 were identical to D1S2779. The nine novel markers were tested by PCR for their specificity on genomic DNA. All except 25CA1 gave a unique signal. To know if these eight specific STSs were polymorphic, 16 unrelated control individuals were genotyped. Seven markers showed variable allele sizes and heterozygosities (Table 5).

To integrate additional published information into our map, the PAC contig was analyzed for the ten novel markers reported by Roberts et al. (1998) (FIG. 3). Two of these markers, D1S2868 and D1S1870E, were identified to have an inverted localization. The whole map is now covered by 46 clones and 69 STSs of which four are known genes: the Ribosomal protein L5, KIAA0231, the EVI5, from which is derived the NB4S chimerical gene (Roberts et al. 1998), and GFI1, a growth factor independence gene (Roberts and Cowell 1997). In addition, some STSs (G4415; D1S1887E; G35002; GDB:191074, G29243 and WI-20561) represent four putative genes as they correspond to a cDNA or to an EST-cluster.

Selection of Clones for Sequencing

The most efficient way to sequence through the area covered by the PAC-map is to select clones presenting a minimum of overlap. Seven of the 46 clones already exist in the Sanger contig maps and have been selected for sequencing (621b10, 629119, 1014c4, 716f6, 878d9, 976013, and 612c19). To cover the whole region, clones 775d17, 1090k7 or 737e21, 606 m5, 103d10, 615c19, 1091c4 and 226k2 should also be selected. To confirm the overlaps, we fingerprinted these 15 clones by Hind III restriction digestion. Fragments of the same size were identified in overlapping clones (FIG. 4).

Cloning of the VMGLOM Gene:

On the basis of the PAC end sequence 33SP6 (unpublished), ESTs homologous to FAP48, as well as the published FAP48 cDNA, the human VMGLOM gene was cloned and sequenced. Sequences obtained from clones were aligned with the published FAP48 sequences and several differences were identified (FIG. 9). Most remarkably, the open reading frame of VMGLOM “long form” was roughly 30% longer than that of FAP48, extending from the published TAG stop codon at position 1339 in the FAP48 sequence to a STOP codon at position 1785 in the VMGLOM “long form” sequence (FIG. 6). This was identified to be due to two mistakes in the published sequence: 1) an extra guanine at position 1565 in FAP 48 sequence, and 2) a missing 85 by at position 1215-1300.

The gene encoding the VMGLOM “long form” has been named “glomulin” by the inventors. Its genomic structure is further illustrated in FIG. 16. The gene is composed of 18 exons and extends over 55 kbp; exon 1 contains the translation start site and exon 18 the TGA stop codon. The sequence of the unique 5′ RACE product obtained was in accordance with the published FAP48 5′-sequence and confirmed the presence of an in frame STOP codon, 81 by before ATG. Northern blot hybridization (FIG. 17) showed one major transcript around 2 kbp (glomulin coding sequence=1785 bp) in 12 human tissues of a Multiple Tissue Northern filter (Clontech). An additional band of ˜3 kbp was observed in most of the tissues. However, the identity of this transcript remains unclear, as the 5′ RACE resulted in a single product.

In addition to the VMGLOM “long form”, another VMGLOM cDNA form, with an extra 24 nucleotides in the 5′ end of exon 4, creating a STOP codon at position 295, was identified among the clones (FIG. 7). This VMGLOM “short form” encodes a predicted protein of only 98 amino acids (FIG. 8). In mouse, both forms were also cloned (FIGS. 11 and 13).

Further analyses of the genomic structure of the 5′ end of the glomulin gene led to the identification of an additional (19^(th) exon) that was named exon −1 (FIG. 27). This exon was identified using the human genome draft sequences and the investigator's glomulin cDNA 5′ sequences. It was observed that the cDNA sequence, 31 by upstream of the ATG codon, was not located 31 by upstream of exon 1 in the genomic draft sequences, but rather 925 by upstream. This fragment (exon −1) of the cDNA had a consensus splice site at its 3′ end of the genomic sequence and consists of at least 57 by (the exact transcription start site being currently unknown, thus the exact number may be higher). As the translation start codon (ATG) is located in exon one, this newly identified exon −1 does not contain coding sequences for amino acids of glomulin.

Identification of Mutations

To screen the VMGLOM cDNA for possible mutations by SSCP and heteroduplex analyses, several overlapping fragments were amplified from patients from whom we had total RNA extracted from EBV-transformed lymphoblasts. Fragments showing abnormal migration in either of these gels, were reamplified and sequenced using Beckman fluorescent dye-terminator technology and the Beckman CEQ 2000 capillary sequencer. Mutations 1-4 (Table 8; VMGLOM^(ΔAA31,32), VMGLOM^(insG107), VMGLOM^(ΔAAGAA157-161), and VMGLOM^(ΔCAA1180-1182)) were identified.

To screen patients from whom we did not have RNA for mutations in the VMGLOM gene, intronic primers (Table 6) were used. With these primer pairs, all 18 exons were amplified and analysed by SSCP, heteroduplex and sequencing gels. Nine additional mutations 5-10 were identified (Table 8).

Most of the mutations lead to frame shift and thereafter to premature STOP codons, and thus, may cause loss-of-function or dominant-negative effects. As the most 5′ mutation creating a premature STOP occurs already in exon 2, it is very likely that the effect of all the identified mutations is loss-of-function.

All mutations were tested by PCR on genomic DNA from all family members, and were shown to co-segregate with venous malformations with glomus cells. Interestingly, the VMGLOM^(ΔAAGAA157-161) mutation was found in seven of the families with a shared haplotype. Thus, the hypothesis that this haplotype sharing reflects identity by descent, and thus relatedness of these families and sharing of the same mutation, was true for these seven families.

Overview of Identified Glomulin Mutations and Penetrance of these Mutations

Mutational screening of glomulin was performed on cDNA produced either from RNA extracted from resected GVMs (glomuvenous malformations) or from cultured lymphoblasts, or alternatively on genomic DNA. Thirteen different mutations were identified in 18 families and in 1 sporadic patient FIGS. 16 & 18). Nine of the mutations were deletions or insertions that cause frame-shifts resulting in premature stop codons. Mutation 157delAAGAA was present in all seven families in which the inventors previously found strong evidence for linkage disequilibrium (Irrthum et al., in press), proving the ancestral origin of the identified haplotype. An additional deletion was found in family Chn. It affects an adenine at the +4 position of the consensus donor site sequence of intron 5, and should, thus, interfere with splicing of exon 5, probably resulting in exonic skipping. Loss of this 238 by exon would also modify the reading frame and result in a premature stop codon. In addition, two nonsense mutations were detected: a substitution of 108C by an A in a TGC codon (family Ba) and the replacement of 1547C by a G in a TCA codon (family Ft). The only mutation that would not cause a premature stop codon was a deletion of 3 nt (family Du), equivalent to the removal of an asparagine at position 394. Since no mutations were found in previously published families Ad and Al (Irrthum et al., in press), and the mutations in families Ba and Del are different, the sharing-by-chance of a similar haplotype in these four families was confirmed (lrrthum et al., in press).

The co-segregation experiments (FIG. 18) allowed the detection of altogether 19 unaffected carriers and 5 phenocopies. The penetrance of the different mutations varied from 50 to 100%. The combined penetrance for the most common mutation, 157delAAGAA, was 95.6%, whereas the overall penetrance for all mutations was 88.2%. Penetrance increased by age, as the onset of the first lesion varied from birth to puberty. Thus, at 20 years of age, the overall penetrance rose to 96.5%. The fact that the disorder can be expressed as only a single tiny blue lesion anywhere on the body (Boon et al., 1999), creates difficulties in the determination of affecteds, a partial explanation for the observed penetrance below 100%.

Cloning of Genomic Fragments of the Mouse Glomulin Gene

The amplification and/or subcloning of genomic fragments of the murine glomulin gene led to the decoding of altogether about 18 kbp of murine genomic sequences. Exon-intron structure of the murine gene was revealed between exons 1 and 7 (FIGS. 28 and 29). Because of the large size of the second intron (about 10 kbp), only partial sequences were obtained (altogether 2.5 kbp) for this intron. From exon 3 until exon 7, all introns were completely sequenced (FIG. 29). Their sizes varied between 1301 by to over 10 kbp. In addition, using homology based search, a novel murine exon −1 was identified based on the novel human exon −1 sequences (FIGS. 27 and 28).

These sequences allow, among others, the construction of precise restriction digestion maps of these parts of the murine glomulin gene. These maps are important, among others, for the in vitro construction of fragments of the murine glomulin gene that could be used for homologous recombination to result e.g. in ES cells that are genetically modified.

Two constructs for such experimenst were designed (FIG. 36). The first construct, which contains the LacZ marker gene positioned at the ATG start codon of the glomulin gene, would lead to a glomulin null-allele. In addition, it would allow to study the marker gene expression in vivo under the normal control of the endogenous glomulin promoter, especially in the heterozygous mice, in case homozygotes would be lethal.

The second construct was designed to allow conditional knock-out of the glomulin gene. Using the Cre-loxP system, the DNA fragment between the inserted loxP sites can be excised by introduction of the Cre-recombinase. Thus, in mice or murine embryos or ES cells, homozygous for this construct, a deficiency of glomulin can be introduced in a given time point. This should be especially helpful for the study of the function of glomulin in various organs, developmental time points, and various pathogenic as well as physiologic processes.

Human Multiple Tissue Expression Dot Blot

All the tissues on the Human Multiple Tissue Expression Dot Blot (MTE™) showed a positive hybridisation signal (FIG. 30). Thus, glomulin seems to be expressed in all the human tissues examined ranging from cardiovascular tissues to brain parenchyma and carcinoma cell lines. This may reflect the fact that glomulin is widely expressed in several cell types, or that, as blood vessels are present in most tissues, the positive signals are due to the glomulin present in blood vessels. In that case, the detected expression of glomulin in cancers, such as cervical adenocarcinoma (Hela S3), lung carcinoma epithelial cell line (A549), leukemias (K-562, MOLT-4, and HL-60), Burkitt's lymphomas (Raji and Daudi) and colorectal adenocarcinoma, epithelial cell line (SW480), would be due to inappropriate expression, and glomulin could serve as a marker for transformed cells. It may also be that glomulin is expressed by a variety of cell types, and its expression in cancer only encounters qualitative or quantitative alteration in e.g. expression or concentration, and thus serves as a target e.g. for diagnosis, treatment and prevention. As the embryonic tissues were also positive for glomulin, expression of glomulin occurs already during human embryogenesis.

Human Multiple Tissue RT-PCR Analysis

Multiple human tissues were studied by RT-PCR for the expression of the glomulin gene. The amplified fragments were designed so that a distinction could be made between the amplification product originating from cDNA and the one from contaminating genomic DNA. The primers synthesized were from exons 2 and 6 (primers 15 and 8) for fragment A from the 5′ end of the cDNA (FIGS. 31A and C), and from exon 9 and 12 (FIGS. 31B and D) for fragment B, from the 3′ end of the cDNA. Thus, the size of amplified cDNA is 561 bp, whereas the corresponding genomic fragment would be about 11.5 kbp. Analogously, for fragment B, the size amplified from cDNA is 503 bp, whereas the genomic amplification product has a size about 17 kbp (see FIG. 15). As both amplification products correspond to the expected size of cDNA, they reflect the expression of glomulin in the corresponding tissue (FIG. 32). Tissues tested included: an artery, aorta, heart, placenta, skeletal muscle, skin, cultured smooth muscle cells, umbilical cord, umbilical vein, vena cava, glomuvenous malformation resected from a patient with a known 5 by mutation resulting in a premature STOP codon in the glomulin gene, kaposiform hemangioendothelioma (KHE), and a venous malformation with as of yet no known mutation in the TIE2/TEK gene (FIG. 31). A plasmid containing glomulin cDNA was used as a positive control, and water as negative control. All tissues showed an amplification product of the expected size of 561 or 503 bp, thus revealing that glomulin is expressed in all the studied tissues. As cultured smooth muscle cells express glomulin, and GVMs with glomulin mutations show altered differentiation of smooth muscle cells (replaced by glomus cells), glomulin is likely to be an important factor for smooth muscle development. As vascular smooth muscle cell phenotypic modulation (“synthetic” versus “contractile”) has been reported during vascular development and disease states (such as in atherosclerotic plaque formation), glomulin may serve as a new target for altering such changes.

Control RT-PCR, using primers specific to glyceraldehyde phosphate dehydrogenase, and glucose-6-phosphate dehydrogenase demonstrated equal concentration of cDNA for every sample (results not shown).

Mouse Developmental Stage RT-PCR Analysis

Glomulin expression was also studied during mouse development by RT-PCR analysis. Total RNAs were extracted from total murine embryos of 10, 14, 16 and 18 days post-coitum. cDNAs were created using the SUPERSCRIPT™ kit (Gibco-BRL) according to the protocol of the manufacturer. Primers used in the amplification were from exons 1 and 3, in the 5′ end of the gene, amplifying a cDNA fragment of 196 base pairs. This fragment covers two exon-intron boundaries. All embryonic time points show an amplification product of the expected size of 196 by (FIG. 37). Thus, glomulin is expressed already during embryogenesis in the mouse, at least from embryonic day 10 until 18.

Polyclonal Antisera Against Human Glomulin Peptides

Two (#454 and 455) of the four polyclonal antisera created against the two synthesized peptides of glomulin (207 and 208) showed increases in titers on ELISA assays. Both of these antisera were induced with the peptide 208 from the C-terminal end of the glomulin polypeptide sequence. As the titer increase was the best for antiserum #455, this was mainly used in the subsequent assays (FIG. 33).

An estimate for working dilution for the purified IgG fraction was obtained by Western dot blots using the synthesized peptides in varying concentrations as template. Even at dilution 1:4500, the antiserum 455 gave specific results for the low antigen amounts of 10 ng (FIG. 34). All subsequent experiments were performed using 455 in 1:5000 dilution.

The decision to concentrate on 455 was further influenced by the observation that purified IgG fractions obtained from the other 3 antisera showed consistent cross hybridization to an assumed non-specific band at roughly 97 kDa (FIGS. 38-40).

Bacterial Expression Constructs, Expressions and Extractions

To study the glomulin protein in vitro, it was overexpressed in E. coli using pET-3a or pET-15b bacterial expression vectors (Invitrogen BV, The Netherlands) containing a T7 promoter (FIG. 35). These plasmids were transformed into E. coli strain BL21 (donated by the group of Emile Van Schaftingen, Brussels, Belgium) containing a native plasmid pLysS harbouring the gene encoding T7 lysosyme that causes the lysis of endogenous T7 RNA polymerase. By adding IPTG to the culture medium, the production of T7 RNA polymerase is increased to such an extent that T7 lysosyme can no longer lyse all the native T7 RNA polymerase. This resulting increased production of T7 RNA polymerase leads to increased expression of the downstream glomulin gene in pET-3a and 15b vector constructs.

The bacterial expressions were performed at LB medium and in M9 low salt medium and at 16° C., 22° C. and at 37° C., to identify the best expression conditions. LB at 22° C. gave the largest amount of protein expressed in the soluble fraction, and was thus chosen as the condition for further experiments.

The expressions made it possible to study the specificity of the created antisera to the protein created by both construct. The pET-15b “Histidine-tag” construct produces a protein corresponding to the open reading frame of glomulin plus a 6× histidine tag contained in a 20 amino acid hinge (MGSS-HHHHHH-SSGLVPRGSH-glomulin) (SEQ ID NO: 154), whereas the pET-3a “wild-type” construct produces a protein corresponding to the open reading frame of glomulin alone. The advantage of the pET-15b construct is that it is possible to screen the protein product by both the polyclonal antisera (452, 453, 454, or 455) and an antibody against histidine.

Expression Analysis by Western Blotting

The presence of glomulin protein in various human tissues and eukaryotic cell lines, as well as bacteria expressing the introduced glomulin constructs were tested by Western blot analysis. These tissues and cell extracts were analyzed with three of the four available antisera, 452, 453 and 455.

Western blots using the purified IgG fraction from the antiserum of 452 and 453 showed a band of 67 kDa in lanes with protein lysates from pET-15b transformed bacteria over-expressing glomulin. This corresponds to the expected size of glomulin with 6×HIS tag (FIGS. 38 and 39). All lanes were loaded with 7 μg of protein as calculated by the BCA-200 assay (Pierce).

Western blots using the purified IgG fraction from the antiserum 455 also showed a 67 kDa protein in lysates from the supernatant fraction of pET-15b transformed bacteria over-expressing glomulin. The concentration of this protein increased in conjunction with increasing growth periods (FIG. 40A). All lanes were loaded with 27 μg of protein as calculated by the BCA-200 assay (Pierce)

Western blot using anti-histidine tag antibody showed specific binding to a 67 kDa protein in lysates from the supernatant fraction of pET-15b transformed bacteria over-expressing glomulin, which increased in concentration in conjunction with increasing growth periods (FIG. 40B). This result confirmed the identity of the protein detected with the polyclonal antisera. All lanes were loaded with 54 μg of protein as calculated by the BCA-200 assay (Pierce).

Western blots with purified 455 show a clear difference between the bacterially expressed, His-tagged, column purified 67 kDa pET-15b protein and the 65 kDa pET-3a protein; and the 58 kDa protein extracted from human tissues (FIG. 42 A). The 58 kDa glomulin protein was observed in vena cava, umbilical cord, placenta, heart, aorta, umbilical vein, renal artery, atrium, splenic artery, testicle, left ventricle, right ventricle, supra-renal vena cava, portal vein and inferior vena cava (FIGS. 41B and 42). Also apparent in FIG. 42B is an double band of about 58 and 60 kDa, observed only in aorta. All lanes were loaded with 60 μg of eukaryotic, and 12 μg of prokaryotic protein for A, 45 μg of protein for B, and 50 μg of protein for 43, as calculated by the BCA-200 assay (Pierce).

Interestingly, all these Western blot results revealed that the glomulin protein, although 67 kDa, as expected in the bacterial histidine-tagged expressions, and 65 kDa, as expected in the bacterial non-histidine tagged expression, only had the size of around 58 kDa in the human tissue extracts. This suggests that it undergoes either post-translational processing, such as proteolytic cleavage, or that in eukaryotic cells, a shorter protein is translated. As the Western blot analysis identified glomulin protein in vena cava, umbilical cord, placenta, heart, aorta, and umbilical vein, it is clear that it is not only present in veins, but in other vessels, too. Especially elevated quantities were observed in heart structures. In addition, veins seems to contain more glomulin than other vessels. Thus, glomulin may have a specific function in vein morphogenesis and/or maintenance.

REFERENCES

-   Allikmets, R., N. Singh, H. Sun, N. F. Shroyer, A. Hutchinson, A.     Chidambaram, B. Gerrard, L. Baird, D. Stauffer, A. Peiffer, A.     Rattner, P. Smallwood, Y. Li, K. L. -   Anderson, R. A. Lewis, J. Nathans, M. Leppert, M. Dean, and J. R.     Lupski. 1997. A photoreceptor cell-specific ATP-binding transporter     gene (ABCR) is mutated in recessive Stargardt macular dystrophy. Nat     Genet. 15: 236-246. -   Boon, L. M., P. Brouillard, A. Irrthum, L. Karttunen, M. L.     Warman, R. Rudolph, J. B. Mulliken, B. R. Olsen, and M.     Vikkula. 1999. A gene for inherited cutaneous venous anomalies     (“glomangiomas”) localizes to chromosome 1p21-22. Am J Hum Genet.     65: 125-133. -   Boon, L. M., J. B. Mulliken, M. Vikkula, H. Watkins, J.     Seidman, B. R. Olsen, and M. L. Warman. 1994. Assignment of a locus     for dominantly inherited venous malformations to chromosome 9p. Hum     Mol Genet. 3: 1583-1587. -   Brouillard, P., Olsen, B. R. & Vikkula, M. 2000 High resolution     physical and transcript map of the locus for venous malformations     with glomus cells (VMGLOM) on chromosome 1p21-22. Genomics 67,     96-101. -   Calvert, J. T., T. J. Riney, C. D. Kontos, E. H. Cha, V. G.     Prieto, C. R. Shea, J. N. Berg, N. C. Nevin, S. A. Simpson, K. A.     Pasyk, M. C. Speer, K. G. Peters, and D. A. Marchuk. 1999. Allelic     and locus heterogeneity in inherited venous malformations. Hum Mol     Genet. 8: 1279-1289. -   Gallione, C. J., K. A. Pasyk, L. M. Boon, F. Lennon, D. W.     Johnson, E. A. Helmbold, D. S. Markel, M. Vikkula, J. B.     Mulliken, M. L. Warman, et al. 1995. A gene for familial venous     malformations maps to chromosome 9p in a second large kindred. J Med     Genet. 32: 197-199. -   Ioannou, P. A. and P. J. de Jong. 1996. Construction of bacterial     artificial chromosome libraries using the modified P1 (PAC) system.     In Current Protocols in Human Genetics (eds. Dracopoli et al.) Unit     5.15. John Wiley and Sons, NY. -   Irrthum, A. et al. Linkage disequilibrium narrows locus for venous     malformation with glomus cells (VMGLOM) to a single 1.48 MBP YAC.     Eur J Hum Genet, in press. -   Klockars, T., M. Savukoski, J. Isosomppi, M. Laan, I. Jarvela, K.     Petrukhin, A. Palotie, and L. Peltonen. 1996. Efficient construction     of a physical map by fiber-FISH of the CLN5 region: refined     assignment and long-range contig covering the critical region on     13q22. Genomics 35: 71-78. -   Lathrop, G. M., Lalouel, J. M., Julier, C., Ott, J. 1984. Strategies     for multilocus linkage in humans. Proc. Natl. Acad. Sci. USA 81:     3443-3446. -   Lee, W. C., B. Balsara, Z. Liu, S. C. Jhanwar, and J. R.     Testa. 1996. Loss of heterozygosity analysis defines a critical     region in chromosome 1p22 commonly deleted in human malignant     mesothelioma. Cancer Res 56: 4297-4301. -   Paavola, P., K. Avela, N. Horelli-Kuitunen, M. Barlund, A.     Kallioniemi, N. Idanheimo, M. Kyttala, A. de la Chapelle, A.     Palotie, A. E. Lehesjoki, and L. Peltonen. 1999. High-resolution     physical and genetic mapping of the critical region for Meckel     syndrome and Mulibrey Nanism on chromosome 17q22-q23. Genome Res 9:     267-276. -   Roberts, T., O. Chemova, and J. K. Cowell. 1998. NB4S, a member of     the TBC1 domain family of genes, is truncated as a result of a     constitutional t(1;10)(p22;q21) chromosome translocation in a     patient with stage 4S neuroblastoma. Hum Mol Genet. 7: 1169-1178. -   Roberts, T. and J. K. Cowell. 1997. Cloning of the human Gfi-1 gene     and its mapping to chromosome region 1p22. Oncogene 14: 1003-1005. -   Sheffield, V. C., M. E. Pierpont, D. Nishimura, J. S. Beck, T. L.     Burns, M. A. Berg, E. M. Stone, S. R. Patil, and R. M. Lauer. 1997.     Identification of a complex congenital heart defect susceptibility     locus by using DNA pooling and shared segment analysis. Hum Mol     Genet. 6: 117-121. -   Vikkula, M., L. M. Boon, K. L. Carraway, 3rd, J. T. Calvert, A. J.     Diamonti, B. Goumnerov, K. A. Pasyk, D. A. Marchuk, M. L.     Warman, L. C. Cantley, J. B. Mulliken, and B. R. Olsen. 1996.     Vascular dysmorphogenesis caused by an activating mutation in the     receptor tyrosine kinase TIE2. Cell 87: 1181-1190. -   Vikkula, M., L. M. Boon, J. B. Mulliken, and B. R. Olsen. 1998.     Molecular basis of vascular anomalies. Trends in Cardiovascular     Medicine 8: 281-292. -   Barany, F (1991). Genetic disease detection and DNA amplification     using cloned thermostable ligase. Proc. Natl. Acad. Sci. USA, 88,     189-193. -   Compton, J (1991). Nucleic acid sequence-based amplification.     Nature, 350, 91-92. -   Duck, P. (1990) Probe amplifier system based on chimeric cycling     oligonucleotides. Biotechniques, 9, 142-147. -   Guatelli, J C; Whitfield, K M; Kwoh, D Y; Barringer, K J, Richman, D     D; Gingeras, T R (1990). Isothermal, in vitro amplification of     nucleic acids by a multienzyme reaction modeled after retroviral     replication. Proc. Natl. Acad. Sci. USA, 87, 1874-1878. -   Kwoh, D; Davis, G; Whitfield, K; Chappelle, H; Dimichele, L;     Gingeras, T. (1989). Transcription-based amplification system and     detection of amplified human immunodeficiency virus type 1 with a     bead-based sandwich hybridization format. Proc. Natl. Acad. Sci.     USA, 86, 1173-1177. -   Kwok, S., Kellog, D., McKinney, N., Spasic, D., Goda, L.,     Levenson, C. and Sinisky, J. (1990). Effects of primer-template     mismatches on the polymerase chain reaction: Human immunodeficiency     views type 1 model studies. Nucl. Acids Res., 18: 999. -   Landgren, U; Kaiser, R; Sanders, J; Hood, L. (1988). A     ligase-mediated gene detection technique. Science 241, 1077-1080 -   Lizardi, P; Guerra, C; Lomeli, H; Tussie-Luna, I; Kramer, F (1988).     Exponential amplification of recombinant RNA hybridization probes.     Bio/Technology 6, 1197-1202. -   Lomeli, H; Tyagi, S; Printchard, C; Lisardi, P; Kramer, F (1989).     Quantitative assays based on the use of replicatable hybridization     probes. Clin. Chem., 35, 1826-1831. -   Walker, G; Little, M; Nadeau, J; Shank, D (1992). Isothermal in     vitro amplification of DNA by a restriction enzyme/DNA polymerase     system. Proc. Natl. Acad. Sci. USA, 89, 392-396. -   Wu, D; Wallace, B. (1989). The ligation amplification reaction     (LAR)—amplification of specific DNA sequences using sequential     rounds of template-dependent ligation. Genomics, 4, 560-569. 

1. An isolated nucleic acid molecule selected from the group consisting of: (a) a nucleic acid molecule encoding a human polypeptide having the sequence of SEQ ID NO 2, (b) a nucleic acid molecule encoding a human polypeptide having the amino acid sequence of amino acid position 405 to 594 in SEQ ID NO 2, (c) a nucleic acid molecule encoding a human polypeptide consisting of the amino acid sequence of SEQ ID NO 4, (d) a nucleic acid molecule having the nucleotide sequence of SEQ ID NO 1 or 3, and (e) a nucleic acid molecule consisting of the 85 bp nucleotide sequence of position 1253 to 1337 in SEQ ID NO 1, (f) a modified SEQ ID NO: 1 having a mutation selected from the group consisting of deletion of 2 nucleotides, positions 31-32; insertion of a G, position 107; substitution of a C by an A, position 108; deletion of 5 nucleotides, positions 157-161; insertion of an A, position 423; deletion of 4 nucleotides, positions 554+556−558; deletion of the 5th nucleotide (G) in the splice-site consensus in intron 5 (5′ end of intron); deletion of 4 nucleotides, positions 842-845; deletion of 3 nucleotides, positions 1179-1181; deletion of a T, position 1355; deletion of 4 nucleotides, positions 1470-1473; substitution of a C by a G, position 1547; and deletion of GT, positions 1711-1712, wherein numbering of said mutations refers to the nucleotide numbering as used in SEQ ID NO: 1, where +1 is the A of the ATG codon at positions 39 to 41, and (g) a nucleic acid molecule comprising the nucleotide sequence set forth by SEQ ID NO: 143 with a deletion of the TA at position 3800-3801 and the full length complement thereof.
 2. A molecule comprising the isolated nucleic acid according to claim 1 wherein said molecule is incorporated into a diagnostic kit.
 3. A composition comprising the molecule according to claim
 2. 4. A method for detecting the presence of mutations of claim 1 associated with venous malformations of glomus cells in a nucleic acid sequence comprising obtaining a sample containing nucleic acids, and determining the presence of mutations in a nucleic acid sequence wherein the nucleic acid sequence is a nucleic acid sequence according to claim
 1. 5. An isolated DNA construct comprising a nucleic acid consisting of the nucleic acid according to claim
 1. 6. An isolated host cell transformed in vitro with a DNA construct according to claim
 5. 